>>Someone sent us a PDF that was from a scanned image. The quality is poor at best. It is mostly text and that is what I care about. Any way to improve the quality to make it more readable? Using just Adobe Reader 8.1 to view the PDF.
>
>Check PDFToText free utility.
>
>
Re: Extract text from PDF Thread #
1217313 Message #
1217317Hmm, I suspect the PDF file does not contain any text per se that could be extracted via a utility like PDFToText; I'm guessing it contains only the original scanned image. It might be possible to:
- use one of the utilities in XPDF to extract the image from the PDF:
http://en.wikipedia.org/wiki/Xpdf- then, use Optical Character Recognition (OCR) to extract the text from the image
- once the text has been extracted it can be printed/viewed "perfectly" via NotePad, Word etc.
There are some free OCR packages such as SimpleOCR (
http://www.simpleocr.com/ ) that could help here. Some non-free programs such as the full version of Adobe Acrobat have workflow features that help automate this kind of thing if you do it a lot.
Regards. Al
"Violence is the last refuge of the incompetent." -- Isaac Asimov
"Never let your sense of morals prevent you from doing what is right." -- Isaac Asimov
Neither a despot, nor a doormat, be
Every app wants to be a database app when it grows up