Tesseract options & image preprocessing

Question

Edit : As asked, here is the original image

Dear community I am trying to do some ocr.
I have already pre-processed the image a lot (unskew, crop...)
Now, I can read the digits myself with no problem
But I can't get tesseract giving me a meaningfull result.

Click on the link at the top to see the image I am trying to OCR

Is there more pre-processing I am missing ?
Or do I call tesseract badly ?

I tried with no option at all, or with that :

config = ('--psm 13 -c tessedit_char_whitelist=0123456789')

Edit :

Funny thing, I tried multiple ways :

Tesseract 5 on Windows, give nothing 'eT' (but maybe bad config)
Google API from Phyton Jupyter Notebook on Windows => 'UO0 1124' or something like that don't quite remember
Tesseract 4 on buntu with config = ('-l eng --oem 1 --psm 13') : 'WU000 244m'
Google API from Python Jpyter Notebook on Ubuntu => 'U000241\n'

So It's the very beggining for me. Imay prefere to use Tesseract so as not to pay big bucks. Will se what I can do when my project is more advanced.

But I am eager to hear your suggestions about image preprocessing !! :-)

So if you have suggestion.

Regards !

Is posted image original or preprocessed? If preprocessed then please post original. — user898678, Sep 29 '19 at 12:54

score 10 · Answer 1 · answered Sep 28 '19 at 18:58

You can give three important flags for tesseract to work and these are -l , --oem , and --psm.

The -l flag controls the language of the input text.
The --oem argument, or OCR Engine Mode, controls the type of algorithm used by Tesseract.
The --psm controls the automatic Page Segmentation Mode used by Tesseract.

to get options use:

tesseract --help-oem for oem.
tesseract --help-psm for psm.
https://github.com/tesseract-ocr/tesseract/wiki/Data-Files for language codes:

use these options like this config = ("-l eng --oem 1 --psm 7")

Tesseract options & image preprocessing

1 Answers1