Converting from PDF - Level Extreme

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Converting from PDF

Message

27/03/2017 06:56:46

Colin Northway
Colin Northway Associates
London, Royaume Uni

25/03/2017 05:09:51

Dragan Nedeljkovich (En ligne)
Now officially retired
Zrenjanin, Serbia

Information générale

Forum:

Visual FoxPro

Catégorie:

Produits tierce partie

Titre:

Re: Converting from PDF

Divers

Thread ID:

01649292

Message ID:

01649369

Vues:

>>I have a PDF document from which I need to copy a couple of paragrahs into a word document
>>
>>I dont have adobe - is there any way of doing this without having to buy software
>
>I did that once, when I really had to, and it's kind of convoluted.
>
>1) import the pdf into LibreOffice Impress (the slideshow thingy, for powerpoint-like documents)
>2) save that into its own format
>3) that's open document format, which is actually a zip file full of xml files
>4) extract content.xml
>5) play with that until you find the tags you want. Not for the faint of heart, there was a lot of levels and each line is an element for itself, i.e. it doesn't preserve paragraphs but rather turns each line into one. You have to play with vertical offsets to recognize which lines make a paragraphs and where to break.
>
>Here's the code
>

#DEFINE hExStart	[<text:span text:style-name="T]
>#DEFINE hExEnd		[</text:span>]
>
>#DEFINE hBox1	[<draw:frame draw:style-name="gr]
>#DEFINE hBox2	[/draw:frame>]
>
>DO setkey
>
>CREATE CURSOR crsDx	(x N(8,3), Y N(8,3), TEXT VARCHAR(120))
>
>lcText=FILETOSTR("\very long path\content.xml")
>lastx=0
>lasty=0
>c=""
>nBoxes=OCCURS(hBox1, lcText)
>FOR i=1 TO nBoxes
>	lcBox=STREXTRACT(lcText, hBox1, hBox2,i,5)
>	newx=VAL(STREXTRACT(lcBox, [svg:x="], "cm"))
>	newy=VAL(STREXTRACT(lcBox, [svg:y="], "cm"))
>	FOR j=1 TO OCCURS(hExStart, lcBox)
>		c=STREXTRACT(lcBox, hExStart, hExEnd, j, 5)
>		c=unTag(c)
>		c=STRTRAN(c, "&lt;", "<")
>		c=STRTRAN(c, "&gt;", ">")
>		c=STRCONV(STRCONV(c,11),2)
>		IF NOT EMPTY(c)
>			o=NEWOBJECT("empty")
>			ADDPROPERTY(o, "x", newx)
>			ADDPROPERTY(o, "y", newy)
>			ADDPROPERTY(o, "text", c)
>			INSERT INTO crsDx FROM NAME o
>			IF RECNO()%255=0
>				WAIT WINDOW TEXTMERGE([<<i>>/<<nBoxes>>]) NOWAIT
>			ENDIF
>		ENDIF
>	ENDFOR
>ENDFOR
>* now output
>c=""
>SCAN FOR y<27
>	c = c + text+0h0d0a
>ENDSCAN
>STRTOFILE(c, "mypath\mydoc.txt")
>
>
>FUNCTION unTag(tcString)
>*[2010/06/24 17:44:25] ndragan - strip html, return text.
>LOCAL c, lcTag
>#DEFINE hLT "<"
>#DEFINE hGT ">"
>c=tcString
>DO WHILE hLT$c
>	lcTag=STREXTRACT(c, hLT, hGT,1,4)
>	c=STRTRAN(c, lcTag, "")
>ENDDO
>RETURN c
>

>
>Now the numbers like 27 (for y) and the specific tags I was looking for may look quite different in your case... but this worked well enough.

Thanks

Specialist in Advertising, Marketing, especially Direct Marketing

I run courses in Business Management and Marketing

Répondre

Fil

Voir

Click here to load this message in the networking platform