Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Converting from PDF
Message
From
25/03/2017 05:09:51
Dragan Nedeljkovich (Online)
Now officially retired
Zrenjanin, Serbia
 
General information
Forum:
Visual FoxPro
Category:
Third party products
Miscellaneous
Thread ID:
01649292
Message ID:
01649321
Views:
54
Likes (1)
>I have a PDF document from which I need to copy a couple of paragrahs into a word document
>
>I dont have adobe - is there any way of doing this without having to buy software

I did that once, when I really had to, and it's kind of convoluted.

1) import the pdf into LibreOffice Impress (the slideshow thingy, for powerpoint-like documents)
2) save that into its own format
3) that's open document format, which is actually a zip file full of xml files
4) extract content.xml
5) play with that until you find the tags you want. Not for the faint of heart, there was a lot of levels and each line is an element for itself, i.e. it doesn't preserve paragraphs but rather turns each line into one. You have to play with vertical offsets to recognize which lines make a paragraphs and where to break.

Here's the code
#DEFINE hExStart	[<text:span text:style-name="T]
#DEFINE hExEnd		[</text:span>]

#DEFINE hBox1	[<draw:frame draw:style-name="gr]
#DEFINE hBox2	[/draw:frame>]

DO setkey

CREATE CURSOR crsDx	(x N(8,3), Y N(8,3), TEXT VARCHAR(120))

lcText=FILETOSTR("\very long path\content.xml")
lastx=0
lasty=0
c=""
nBoxes=OCCURS(hBox1, lcText)
FOR i=1 TO nBoxes
	lcBox=STREXTRACT(lcText, hBox1, hBox2,i,5)
	newx=VAL(STREXTRACT(lcBox, [svg:x="], "cm"))
	newy=VAL(STREXTRACT(lcBox, [svg:y="], "cm"))
	FOR j=1 TO OCCURS(hExStart, lcBox)
		c=STREXTRACT(lcBox, hExStart, hExEnd, j, 5)
		c=unTag(c)
		c=STRTRAN(c, "&lt;", "<")
		c=STRTRAN(c, "&gt;", ">")
		c=STRCONV(STRCONV(c,11),2)
		IF NOT EMPTY(c)
			o=NEWOBJECT("empty")
			ADDPROPERTY(o, "x", newx)
			ADDPROPERTY(o, "y", newy)
			ADDPROPERTY(o, "text", c)
			INSERT INTO crsDx FROM NAME o
			IF RECNO()%255=0
				WAIT WINDOW TEXTMERGE([<<i>>/<<nBoxes>>]) NOWAIT
			ENDIF
		ENDIF
	ENDFOR
ENDFOR
* now output
c=""
SCAN FOR y<27
	c = c + text+0h0d0a
ENDSCAN
STRTOFILE(c, "mypath\mydoc.txt")


FUNCTION unTag(tcString)
*[2010/06/24 17:44:25] ndragan - strip html, return text.
LOCAL c, lcTag
#DEFINE hLT "<"
#DEFINE hGT ">"
c=tcString
DO WHILE hLT$c
	lcTag=STREXTRACT(c, hLT, hGT,1,4)
	c=STRTRAN(c, lcTag, "")
ENDDO
RETURN c
Now the numbers like 27 (for y) and the specific tags I was looking for may look quite different in your case... but this worked well enough.

back to same old

the first online autobiography, unfinished by design
What, me reckless? I'm full of recks!
Balkans, eh? Count them.
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform