tensorflow object detection api (ssd + mobilenet) for ocr (detection and reading). Bad for long symbol sequences

Question

I am trying to learn Tensorflow Object Detection API (SSD + MobileNet architecture) on the example of reading sequences of Arabic numbers. Generated images with random sequences of numbers of different lengths - from one digit to 20 were fed to the input.

The result is perfect detection and reading for short sequences (up to 5 characters). And a terrible result for long sequences - characters are skipped or several digits are read as one.

What could be the problem? You can think about some kind of built-in pre-processing, but at the training stage, the network also saw sequences of different lengths.

It could be because of a bunch of different reasons. How did you perform train and test data split? Do you see the same issue on the training set as well? — Cerovec, Apr 13 '20 at 09:49
Yes, sure. All my dataset was automatically generated from word fonts with different random transformations. both test and train contain all lengths/ And I see the same issue on the train set. — lenkele, Apr 13 '20 at 11:31
Great. And do you train your model to output the sequence or the individual characters? — Cerovec, Apr 13 '20 at 19:01
I have 10 classes in the model - 10 digits. So, every digit of the sequence in the imige is marked by it's class label and it's rectangle. So, the output of the model for any imige is the set of rectangles with the class labels. Below is the fragment of csv file (to build train.record file) : filename,width,height,class,xmin,ymin,xmax,ymax **************************************** 650844.jpg,120,30,7,20,4,30,27 650844.jpg,120,30,6,34,4,46,27 — lenkele, Apr 13 '20 at 20:35
It seems like there's some problem in your training stage (maybe some of your hyperparameters), but I'm not 100% sure. I hope you'll find a solution! — Cerovec, Apr 14 '20 at 10:31
Thank you! I think so. I think about the resizer in config file of the model. May be it is the point: image_resizer { fixed_shape_resizer { height: 300 width: 300 } } — lenkele, Apr 14 '20 at 11:54

tensorflow object detection api (ssd + mobilenet) for ocr (detection and reading). Bad for long symbol sequences

0 Answers0