General information
Category:
Third party products
>>>Hi All.
>>>
>>>There are so many Adobe conversion products out there that I'm really looking for a recommendation. Is there a product that will convert a PDF into its component pages of text and images? iow, for each page in a PDF extract the text to a txt file (eg: Page1.txt) and if there are images on the page then extract those as well (eg:Page1_image1.jpg, Page1_image2.jpg,...)
>>>
>>>Thanks
>>
>>Hi Jos,
>>
>>I have now given PDF Converter a tough task (a document that was image only) which it imported into Word using OCR. I could send original and conversion so you can judge by yourself. The result is far from perfect but I does create tables where they are present.
>>
>>So now I know that the beautiful result I got yesterday is because the document was stored as such, not as images. I can send you a pdf and doc of that kind as well.
>>
>>Let me know.
>>
>>Alex
>
>Thanks very much Alex. It is not quite what I need. The PDF i have has text and images. I need each page of text as a seperate file and each image from each page as a seperate image file. I also need to know to which page an image belongs. An easy naming convention would then be page1.txt, page2.txt,...pageN.txt. And images would be page1_image1.jpg, page1_image2.jpg. In this way one can deconstruct a PDF file into pages of text and associated images. A tall order probably :)
Sounds like you need to automate the program that interprets the PDF.
Previous
Reply
View the map of this thread
View the map of this thread starting from this message only
View all messages of this thread
View all messages of this thread starting from this message only