OCR software or homemade CNN for document processing?

Question

I have a dilemma. If you have only one type of invoice/document, and you have a specific field that you want to process from that invoice and use somewhere else (that filed happens to be a handwritten digit, sometimes written with dashes or slashes), would you use some OCR software or build your own CNN for recognizing the digits? What accuracy would you expect from OCR? Would your CNN be more accurate, as you are just interested in a specific type of digit writing, with specific image dimensions, etc. What would be better in the given situation? Keep in mind, that you would not use it in any other way, or any other place for handwritten digits recognition, and you already have up to 100k and more documents that are copied to a computer by a human, and you can use it for training and testing.

Thank you.

You can [compare online ocr API](https://ocr.space/compare-ocr-software)s to get a feeling for what you can expect from off-the-shelf OCR software and how it works for your documents. — Fabrice Zaks, Oct 01 '18 at 13:47

marco romelli · Answer 1 · 2018-10-01T10:28:15.533

0

I would definitely go for a CNN based solution. Since the structure of your document is consistent:

Extract the desired portion of the document with a standard computer vision approach
Train a CNN on an annotated set of a few thousand documents. You should even be able to finetune an existing CNN trained on MNIST and this would require less training images.

This approach should give you >99% accuracy without much effort. The accuracy of the OCR solution really depends on which library you use and the preprocessing you implement.

edited Oct 01 '18 at 10:28

answered Oct 01 '18 at 10:19

marco romelli

1,143
8
19

I agree that homemade solution would be a better option. You said that I could use MNIST to train the net, but it's not that easy, MNIST has images 32x32 pixels if I'm not wrong, and my field has different dimensions, so now you have another problem, you can't use, fully connected layers because you have only so much of neurons there for classification and if you throw them out you have only convolutional units and it's not that easy. – Igor Oct 01 '18 at 11:56
The image dimension is not a problem since you can resize your images to be 32x32 which is enough to recognize digits. Anyway most of the pretrained networks should work even with different input sizes but in that case you have to replace the fully connected layers. – marco romelli Oct 01 '18 at 12:08

OCR software or homemade CNN for document processing?

1 Answers1