Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Converting from PDF
Message
 
À
25/03/2017 05:09:51
Dragan Nedeljkovich (En ligne)
Now officially retired
Zrenjanin, Serbia
Information générale
Forum:
Visual FoxPro
Catégorie:
Produits tierce partie
Divers
Thread ID:
01649292
Message ID:
01649369
Vues:
46
>>I have a PDF document from which I need to copy a couple of paragrahs into a word document
>>
>>I dont have adobe - is there any way of doing this without having to buy software
>
>I did that once, when I really had to, and it's kind of convoluted.
>
>1) import the pdf into LibreOffice Impress (the slideshow thingy, for powerpoint-like documents)
>2) save that into its own format
>3) that's open document format, which is actually a zip file full of xml files
>4) extract content.xml
>5) play with that until you find the tags you want. Not for the faint of heart, there was a lot of levels and each line is an element for itself, i.e. it doesn't preserve paragraphs but rather turns each line into one. You have to play with vertical offsets to recognize which lines make a paragraphs and where to break.
>
>Here's the code
>
#DEFINE hExStart	[<text:span text:style-name="T]
>#DEFINE hExEnd		[</text:span>]
>
>#DEFINE hBox1	[<draw:frame draw:style-name="gr]
>#DEFINE hBox2	[/draw:frame>]
>
>DO setkey
>
>CREATE CURSOR crsDx	(x N(8,3), Y N(8,3), TEXT VARCHAR(120))
>
>lcText=FILETOSTR("\very long path\content.xml")
>lastx=0
>lasty=0
>c=""
>nBoxes=OCCURS(hBox1, lcText)
>FOR i=1 TO nBoxes
>	lcBox=STREXTRACT(lcText, hBox1, hBox2,i,5)
>	newx=VAL(STREXTRACT(lcBox, [svg:x="], "cm"))
>	newy=VAL(STREXTRACT(lcBox, [svg:y="], "cm"))
>	FOR j=1 TO OCCURS(hExStart, lcBox)
>		c=STREXTRACT(lcBox, hExStart, hExEnd, j, 5)
>		c=unTag(c)
>		c=STRTRAN(c, "&lt;", "<")
>		c=STRTRAN(c, "&gt;", ">")
>		c=STRCONV(STRCONV(c,11),2)
>		IF NOT EMPTY(c)
>			o=NEWOBJECT("empty")
>			ADDPROPERTY(o, "x", newx)
>			ADDPROPERTY(o, "y", newy)
>			ADDPROPERTY(o, "text", c)
>			INSERT INTO crsDx FROM NAME o
>			IF RECNO()%255=0
>				WAIT WINDOW TEXTMERGE([<<i>>/<<nBoxes>>]) NOWAIT
>			ENDIF
>		ENDIF
>	ENDFOR
>ENDFOR
>* now output
>c=""
>SCAN FOR y<27
>	c = c + text+0h0d0a
>ENDSCAN
>STRTOFILE(c, "mypath\mydoc.txt")
>
>
>FUNCTION unTag(tcString)
>*[2010/06/24 17:44:25] ndragan - strip html, return text.
>LOCAL c, lcTag
>#DEFINE hLT "<"
>#DEFINE hGT ">"
>c=tcString
>DO WHILE hLT$c
>	lcTag=STREXTRACT(c, hLT, hGT,1,4)
>	c=STRTRAN(c, lcTag, "")
>ENDDO
>RETURN c
>
>
>Now the numbers like 27 (for y) and the specific tags I was looking for may look quite different in your case... but this worked well enough.

Thanks
Specialist in Advertising, Marketing, especially Direct Marketing

I run courses in Business Management and Marketing
Précédent
Répondre
Fil
Voir

Click here to load this message in the networking platform