How make tesseract ocr distinguish letters from numbers

Question

I have a picture of plate with text "991AAA". I binarized it with opencv, found necessary contours and gave them to Tesseract ocr. But it reads this as "ЧЧЛААА" (rus lang, I guess '9' kinda looks like 'Ч' but no). Is there problem with tesseract config ("-l rus+eng --psm 10")? Am I missing something? Image is looking good and if I give it just piece with single "9" it will read it, so I don't think that it's about image. How can I make it better? P.S. If I change config to "-l eng --psm 10" it will give me "O9-]AAA". Well, at least single "9" is recognized

In general, I don't think there is an easy way to do that. If you switch to c++ API - you would be able to get RIL_SYMBOL iterator and go through symbol alternative and make sure that they follow the pattern. With earlier versions of tesseract (without LSTM) you could specify the user-pattern, but that solely wouldn't have solved your issue anyway, because it only slightly increases the probability of getting what you are expecting. I think that for plate recognition you would be better with some neural networks because you have a single font & plenty of training data online. — Dmitrii Z., Dec 12 '18 at 21:23
@DmitriiZ. I have zero experience of using neural networks, and don't have much time to finish project. Should I try to train tesseract or it's better to find another solution for recognition? — , Dec 16 '18 at 13:33

How make tesseract ocr distinguish letters from numbers

0 Answers0