David,
Beginner question: I downloaded the iSED QuickPDF eval product (need to extract PDF content) but am not sure how to create/instantiate the QuickPDF object from VFP. I know it's simple, but am having trouble figuring it out from the docs, Hackers Guide, etc. Once I instantiate it, I should be all set... Thanks in advance,
- Larry
Carlos,
>I need to read a pdf file and export it to a xml or text file.
>Does somebody know any activex or way to do that?
If the PDF actually contains text and not just an image of text your job will be easier. Otherwise, you'll need a converter that has OCR capabilities, such as Amyuni's AdLib products (pricey).
However, for extracting actual text from a PDF, you can find multiple toolkits, including several with a COM interface. A few good places to look:
www.pdfzone.comwww.planetpdf.comI have been researching a different problem related to PDF, so I cannot absolutely recommend a solution for your need at the moment, but one product that I liked a lot is iSED QuickPDF from
www.sedtech.com, which can read and write PDF files, including simple changes like adding pages, bookmarks, annotations, and so forth. It's reasonably priced ($59 I think) and has a fully-functional demo download.
I just initialized the QuickPDF COM component in VFP8 and Intellisense showed me these two methods of interest to you:
extractFilePageContent
extractFilePageText
Alternatively, especially if the PDF is consistently in the same kind of format, you can use VFP's low-level file functions to read in the PDF file and parse its structure to find the text (not for the faint of heart). There are some toolkits that convert the PDF structure into something like a document object model that you can then walk to find your info, making the job much easier.