Processing OCRed text

Asked Jun 18 '10 at 15:02

Active Mar 01 '12 at 10:33

Viewed 2,062 times

I am extracting texts from OCRed Tiff files by using a library and dumping it in database. The text I am extracting are actually FORMS having fields like NAME,DOB,COUNTRY etc. Since OCR does not the difference between actual value and the label,it's just dumping all text. Now I have text in DB in following format:

Name: MyName Address: My Address

etc

Now the next step is to extract values lile MyName and MyAddrss from the DB. The document types may varry hence a generic parser might not work.

What would you suggest to deal this situation? Should I write different parsers? may ANTLR can help me? if yes then how? Kindly guide me.

I am working on .NET

asked Jun 18 '10 at 15:02

Volatil3

14,253
38
134
263

by that mean that one document could be a "leave application form" while other could be 'Training Request' form. Both could have different fields – Volatil3 Jun 18 '10 at 17:55

Processing OCRed text

0 Answers0