This website won't let me reply to the bottom of the page. I got a CS6 disc, installed it, along with a newer version of adobe professional. The following code, for instance, now works.
strData = ""
strFileName = "c:\d\pdf\spread2z.pdf"
objApp = CreateObject("AcroExch.App")
objPDDoc = CreateObject("AcroExch.PDDoc")
If objPDDoc.Open(strFileName) Then
objjso = objPDDoc.GetJSObject
For page = 0 To objPDDoc.GetNumPages - 1
wordsCount = objjso.GetPageNumWords(page)
For i = 0 To wordsCount
mWord = objjso.getPageNthWord(page, i)
IF .not. ISNULL(mWord) = .t.
strData = strData + " " + mWord
endif
Next i
Next
STRTOFILE(strData,"adtext.txt")
MODIFY COMMAND "adtext.txt"
Else
WAIT wind "Problem with open file!",,"VBATools.pl"
EndIf
Unfortunately, all of the text comes out as a long string, without any of the original table structure evident in so many business-orientated PDF's. Would anyone know how to aotomate Adobe, extract the text while somehow retaining the adobe table structure?
thank you,
Steve
>for working with the Berzniker code as pointed ,there is a prior that adobe professional must be installed mandatory on system to access
>to its objects as AcroExch.App,AcroExch.AVDoc .
>its like excel.application or word.application...must be installed to work on and create COM objects..
>
>
>try
>oAcroApp = CREATEOBJECT("AcroExch.App")
>catch
>messagebox("Adobe AcroExch.App not installed",16+4096,"error")
>endtry
>
>
>
>if you have adobe acrobat pro installed can work with automation and all what you see and all what you run in acrobat page pro menu can be done programmatly by automation code (and from VFP)
>
>its originally a javascript code and you can see API doc at :
>
http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/js_api_reference.pdf>JavaScript for Acrobat API Reference
>
>There is also some vfp works on Libharu library for PDF as in Foxypreviewer, and at
https://sites.google.com/site/pdfclasses/>
>
>there is more sources and more examples of adobe automation , simply google "JavaScript for Acrobat"