PyTesseract - blacklisting chars in a specific position

Asked Jan 06 '22 at 11:45

Active Jan 06 '22 at 14:11

Viewed 322 times

I am working in Python using PyTesseract and OpenCV.

I have a photo that is mixed numbers and letters. The photo is of a date and follows the format DDMMMYY e.g. 01JAN22 Tesseract is having trouble telling the difference between 0 and O and a few other letter and number mix ups.

Is there a way to blacklist / whitelist letters for the specific chars in a string, I know I can blacklist / whitelist out character for the whole image_to_string function using config="-c tessedit_char_blacklist=".

For example: For char[0] whitelist 0-3 (as its a date it'll be either 0,1,2 or 3.

The below image is an example of what I am working with. Currently tesseract returns the result OSJUNZ2 which is very close to 05JUN22.

Thanks for your help

Example Image

edited Jan 06 '22 at 14:11

Christoph Rackwitz

11,317
4
27
36

asked Jan 06 '22 at 11:45

Bigred

I have created a hacky solution where by I add O to the be accepted as a 0 and so on for other letters. Obviously, not ideal. Still would love to hear if anyone has a better solution. – Bigred Jan 07 '22 at 08:02
If the date format is always DDMMMYY then your solution is pretty good, just postprocess the result and you can simply have a dict of mapping, for the DD and YY parts, so O becomes 0, S becomes 5 and Z becomes 2, I can become 1 etc. – Octo Poulos Sep 15 '22 at 13:45

PyTesseract - blacklisting chars in a specific position

0 Answers0