Use of OCR in data importing

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Use of OCR in data importing

Message

From

02/03/1998 15:20:56

Jerry Kreps
Nebraska Dept of Revenue
Lincoln, Nebraska, United States

02/03/1998 14:27:19

Marcus Mason
Bellevue, Washington, United States

General information

Forum:

Visual FoxPro

Category:

Other

Title:

Re: Use of OCR in data importing

Miscellaneous

Thread ID:

00081586

Message ID:

00082142

Views:

90% was offered as an approximation. At times the recognition went to 95+% and at other times it dropped to 80-85%. This was on handwritten stuff. In scanning scanning 106,000 documents only 925 were not correctly identified. Once a document is correctly identified the OCRFF software then applies the template to the document, which matches the locations on the form to fields defined in the template. The OCR engine, looking at over 100 fields, or about 2000 characters, would mark around 25-50 as doubtful. However, sometimes the engine "recognizes" a character incorrectly, which leads to as many as 100 characters being misidentified.

When I used FineReader 3.0b to scann in 36 pages of a type written legal document I had less then ten recognition errors. In a second case a 31 page paper had only 6 unknows or misidentified. The dictionary on FR3.0b even has parts of speech recognition. It's an amazing piece of software.

>>I knew I shouldn't have trusted my memory. I looked up the URL of the >>FineReader 3.0b OCR software. (Make sure it is 3.0b) >> >>http://members.home.com/ibtusa/ocr.htm >> >>>>I occasionally have to import data from printed reports. A few years ago I looked into scanning these and using OCR to get them into a text file. It wasn't nearly good enough back then. Has anyone tried something similar to this more recently? Any feedback is most appreciated. >>>>Thank You, >>>>Marcus. >>> >>> >>>I've done a lot of work in that area. We experimented with OCR reconition of forms filled out by hand. We were using a Kodak 9000 scanner at 200dpi and controlling the process with OCRFF (OCR For Forms). The scanned images are sent to an Oracle Jukebox using FileNet. We were getting around 90% recognition rates until the lease on the Kodak expired and we started using a Bell & Howell scanner at 200dpi. Our recognition dropped below 50% and we abandoned the experiment until our own Kodak 9000 arrived. >>> >>>The OCR engine is really critical. Recently, I tried FineReader 3.0b from a Russian Software house. It is available from bitusa.com. The OCR accuracy is unbelievable! It blows TextBridge98 away. Not only that, It can scan straight into a dbf table, eliminating the need to create an intermediate text file. >>>The Enterprise edition includes OCX's and DLL's that allow you to add their OCR engine to your software projects. > >Thank you for the information. It is greatly appreciated. When you say 90%, do you mean 9 of 10 letters correct or 9 of 10 words? Is this rate with handwritten text? Are the products you suggested geared for analyzing handwriting versus printed text? Do you have any opinion of OmniPage by Caere? I believe that is the correct name. Thanks again. >Marcus.

Nebraska Dept of Revenue

Map

View

Click here to load this message in the networking platform