27

I want to know what parameters the config file used by Tesseract OCR accepts, how to write a config file, etc.

I can't find any documentation about this on their site. How can I determine what parameters are supported, and what they mean?

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
sashoalm
  • 75,001
  • 122
  • 434
  • 781

3 Answers3

19

I found these instructions in the link below. They are about writing the config file and where to place it:

config file is simple text file without BOM and with Unix end-of-line mark (on Windows you can use some advanced text editor e.g. Notepad++ to achieve this).

If you use tesseract executable this is only way how to change tesseract parameters.

config file should be located in your tessdata/configs directory. Have a look there for some examples.

There is a list of all the variables plus descriptions of each one in http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version. Note it's for Tesseract 3.02, things may be different in other versions.

Edit: Also adding a pastebin link in case the above link becomes dead.

Community
  • 1
  • 1
sashoalm
  • 75,001
  • 122
  • 434
  • 781
19

Tesseract v3.04 now offers the command line option --print-parameters, so you can call tesseract --print-parameters to get a list of the 678 (!) configurable parameters, their default values, and a short description:

Tesseract parameters:
editor_image_xpos   590 Editor image X Pos
editor_image_ypos   10  Editor image Y Pos
editor_image_menuheight 50  Add to image height for menu bar
editor_image_word_bb_color  7   Word bounding box colour
editor_image_blob_bb_color  4   Blob bounding box colour
editor_image_text_color 2   Correct text colour
...and many, many more
chbrown
  • 11,865
  • 2
  • 52
  • 60
  • 1
    I can't work out how to feed a file generated from this, after modification, back into Tesseract - any ideas :\ – jtlz2 Sep 04 '19 at 13:41
  • @jtlz2 ooh, good _question_! Esp. considering tesseract is currently a whole major version (4.1.0) newer than when I posted my answer, you should re-post that as a new question. – chbrown Sep 07 '19 at 19:33
  • like this? https://stackoverflow.com/questions/57794165/tesseract-differing-output-how-do-i-find-out-which-parameters-are-being-used – jtlz2 Sep 09 '19 at 07:43
  • Is there any documentation on each of them in the official docs? – Ahmad Anis Jun 08 '22 at 07:43
10

It's just a plain text file containing space-delimited key/value pairs for Tesseract config variables, each on separate line; for instance:

interactive_display_mode T
tessedit_display_outwords T

There are several standard config files -- such as digits, hocr -- under Tesseract tessdata/configs folder.

nguyenq
  • 8,212
  • 1
  • 16
  • 16
  • 1
    Where can I find a list of all the config variables, and of the values they can take? – sashoalm Oct 27 '12 at 05:42
  • 2
    Please refer to this post: http://stackoverflow.com/questions/13087252/where-i-can-find-the-list-of-available-property-name-for-tesseract-setvariable – nguyenq Oct 27 '12 at 22:12
  • 2
    and... how the file config is saved? i mean,... what filename should i gave to it? and... how do the command "tesseract" use that config file specifically? :( I'm a bit confused. @nguyenq – gumuruh Nov 09 '15 at 03:12