1

I am working in Python using PyTesseract and OpenCV.

I have a photo that is mixed numbers and letters. The photo is of a date and follows the format DDMMMYY e.g. 01JAN22 Tesseract is having trouble telling the difference between 0 and O and a few other letter and number mix ups.

Is there a way to blacklist / whitelist letters for the specific chars in a string, I know I can blacklist / whitelist out character for the whole image_to_string function using config="-c tessedit_char_blacklist=".

For example: For char[0] whitelist 0-3 (as its a date it'll be either 0,1,2 or 3.

The below image is an example of what I am working with. Currently tesseract returns the result OSJUNZ2 which is very close to 05JUN22.

Thanks for your help

Example Image

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
Bigred
  • 47
  • 5
  • I have created a hacky solution where by I add O to the be accepted as a 0 and so on for other letters. Obviously, not ideal. Still would love to hear if anyone has a better solution. – Bigred Jan 07 '22 at 08:02
  • If the date format is always DDMMMYY then your solution is pretty good, just postprocess the result and you can simply have a dict of mapping, for the DD and YY parts, so O becomes 0, S becomes 5 and Z becomes 2, I can become 1 etc. – Octo Poulos Sep 15 '22 at 13:45

0 Answers0