2

I’m stuck in training a model for the recognition of characters on images. What I’m currently doing is trying to recognize letters in a relatively small image (700x50) by using the pre-defined faster-rcnn from the TensorFlow object detection repository. The images contain up to 13 letters which I want to identify and some smaller symbols and letters in the background which doesn't have to be recognized.

I've already trained some models with adoptions in the configuration files of the TensorFlow models zoo (using python), and the results in the training (classification precision and loss) are good. However, the box prediction / region proposal is not working for me. When using the model on images, it always finds on the first or first and second character. The other character are not found by the model at all. I've already tried to tweak the anchor parameters and other stuff, but that’s not important for my question.

My question now is: How can I output the boxes / anchors predicted by the region proposal (RPN) in my model? I would like to find out how I have to change my model to get an understanding on what is happening and why the other letters are not even found - let alone classified correctly. But in order to find out, I have to know what the RPN is doing to understand why my model is only finding the first two letters even though I’ve already tried changing many options like the anchor sizes or max predictions...

If someone has the magic answer on how I can output the proposals of the RPN in the TensorFlow faster-RCNN model, so from there I can find out why they don't make it into the final results, that would be great. But I’m equally happy for tips on how I can proceed from here - e.g. building an RCNN on my own and not using the models in the TensorFlow zoo or whatever. Since I'll be further working on this model for a few months, any tips on how to get deeper into creating a better model are appreciated.

Thanks in advance.

0 Answers0