2

The image I try to OCR

Edit : As asked, here is the original image

Dear community I am trying to do some ocr.
I have already pre-processed the image a lot (unskew, crop...)
Now, I can read the digits myself with no problem
But I can't get tesseract giving me a meaningfull result.

Click on the link at the top to see the image I am trying to OCR

Is there more pre-processing I am missing ?
Or do I call tesseract badly ?

I tried with no option at all, or with that :

config = ('--psm 13 -c tessedit_char_whitelist=0123456789')

Edit :

Funny thing, I tried multiple ways :

  • Tesseract 5 on Windows, give nothing 'eT' (but maybe bad config)
  • Google API from Phyton Jupyter Notebook on Windows => 'UO0 1124' or something like that don't quite remember
  • Tesseract 4 on buntu with config = ('-l eng --oem 1 --psm 13') : 'WU000 244m'
  • Google API from Python Jpyter Notebook on Ubuntu => 'U000241\n'

So It's the very beggining for me. Imay prefere to use Tesseract so as not to pay big bucks. Will se what I can do when my project is more advanced.

But I am eager to hear your suggestions about image preprocessing !! :-)

So if you have suggestion.

Regards !

Antoine Driot
  • 46
  • 1
  • 1
  • 5

1 Answers1

10

You can give three important flags for tesseract to work and these are -l , --oem , and --psm.

  • The -l flag controls the language of the input text.

  • The --oem argument, or OCR Engine Mode, controls the type of algorithm used by Tesseract.

  • The --psm controls the automatic Page Segmentation Mode used by Tesseract.

to get options use:

use these options like this config = ("-l eng --oem 1 --psm 7")

Ramesh Kamath
  • 189
  • 1
  • 6