How to increase Pytesseract's accuracy in extracting digits

Question

I am testing Pytesseract, and use it to extract digits like the one below.

The image is of fairly decent quality (200 dpi). However, when I run pytesseract, it gives me the result 456-/8-0000, where the digit 7 is misrecognized as '/'. While "/" obviously bears some resemblance to the digit 7, given the high quality of the image, I am still surprised by it.

I tried both

pytesseract.image_to_string(img)

and

pytesseract.image_to_string(img, lang='eng', config='--psm 13 --oem 2 -c tessedit_char_whitelist=0123456789-')

both yielded the same result.

Any pointer in how to improve the accuracy of recognition would be great. Thanks!

score 0 · Answer 1 · answered Jul 05 '19 at 19:15

0

Which version of tesseract you use. Which tessdata? With recent tesseract and eng from tessdata-best result is perfect:

> tesseract 0mIe5.png  - quiet
456-78-0000

answered Jul 05 '19 at 19:15

user898678

2,994
2
18
17

How to increase Pytesseract's accuracy in extracting digits

1 Answers1