0

I am testing Pytesseract, and use it to extract digits like the one below.

enter image description here

The image is of fairly decent quality (200 dpi). However, when I run pytesseract, it gives me the result 456-/8-0000, where the digit 7 is misrecognized as '/'. While "/" obviously bears some resemblance to the digit 7, given the high quality of the image, I am still surprised by it.

I tried both

pytesseract.image_to_string(img)

and

pytesseract.image_to_string(img, lang='eng', config='--psm 13 --oem 2 -c tessedit_char_whitelist=0123456789-')

both yielded the same result.

Any pointer in how to improve the accuracy of recognition would be great. Thanks!

Alex
  • 4,030
  • 8
  • 40
  • 62

1 Answers1

0

Which version of tesseract you use. Which tessdata? With recent tesseract and eng from tessdata-best result is perfect:

> tesseract 0mIe5.png  - quiet
456-78-0000
user898678
  • 2,994
  • 2
  • 18
  • 17