3

I am working on a problem of handwritten digit recognition. Basically, we need to recognize certain fields in documents like amount, account number, mobile number, etc.

Handwritten Digit recognition can be divided into 2 steps

  1. Digit segmentation
  2. Recognition of segmented digits

For step-2 we can use some pertained MNIST models but the problem is how to segment the digits. I tried OpenCV contours but it is only helpful when digits are separated with blank pixels i.e. when they do not touch each other, but often user writes numbers with touching or connected digits

Can anyone suggest some deep learning or non deep learning based methods for this task

Sample Images

enter image description here

enter image description here

enter image description here

Jeru Luke
  • 20,118
  • 13
  • 80
  • 87
Atinesh
  • 1,790
  • 9
  • 36
  • 57

1 Answers1

3

For deep learning based approach, you can use mask rcnn. It is a very powerful approach and can be used to detect, localize and segment. The algorithm is powerful enough to recognize different classes even if they are close together.

It will generate bounding boxes around the digits and classify them.

Please look into this repository as my explanation would not do justice to the same.

It also contains sample examples for you to learn on. The only thing which may slow you down is that you will have to annotate your images. But using transfer learning you can reduce the amount of data you would actually use to train.

Here are some more relevant links:

Training MaskRCNN on your own dataset

Implementation on your own dataset

To understand more about MaskRCNN

Transfer Learning

There will of course be better articles on the above topics, but these are the ones I've used. Hopefully someone else will suggest them to you.

curse
  • 58
  • 6
  • Hello @zibbyboo, Thanks for the suggestions, you are trying to say that I should train an Object detection model like Mask-RCNN to identify individual digits 0-9 (10 object classes) by preparing a labeled dataset – Atinesh Jun 23 '20 at 09:44
  • Yes, bcoz MNIST has single class in all img and when you are testing them you just input a single digit to get an output. See your problem as identifying multiple classes from a single image. After you have done the prediction of all the classes then you can just stitch the result from left to right for the output. To explain via example. Your model will identify the classes in your image as 4,4,2,0,0,0 and add then you need to add some algorithm to simply give result as 442000. About the / and -, you can add them as classes too, so if they are detected, they can be ignored or also combined. – curse Jun 24 '20 at 10:06
  • 1
    I shall give it a try and also once the digits have been detected by Object detection model then they can be arranged by using x-coordinate of bounding boxes – Atinesh Jun 24 '20 at 10:44