6

I'm using tesseract in a project that runs with docker-compose. I don't know how to configure a single processor core directly in my python file. I want to do this because there is slowness and over-consumption when you parallel Tesseract.

I found many similar topics but they only deal with how to configure OMP_THREAD_LIMIT on the command line. Here is how tesseract is configured in my python code :

__tesseract_config_without_dir = "--psm 3 --oem 1 --dpi 300"

TESSERACT_DATA = os.environ.get(
    "TESSDATA_PREFIX", "/usr/share/tesseract-ocr/4.00/tessdata/"
)

__tesseract_config = (
    __tesseract_config_without_dir
    + ' --tessdata-dir "{}"'.format(config.TESSERACT_DATA)
)

So I would like to add an option like 'OMP_THREAD_LIMIT=1' in my __tesseract_config but I don't know how to write it. In the tesseract documentation, we only find this informations :

"ENVIRONMENT VARIABLES

OMP_THREAD_LIMIT

If the tesseract executable was built with multithreading support, it will normally use four CPU cores for the OCR process. While this can be faster for a single image, it gives bad performance if the host computer provides less than four CPU cores or if OCR is made for many images. Only a single CPU core is used with OMP_THREAD_LIMIT=1."

André C. Andersen
  • 8,955
  • 3
  • 53
  • 79
Elisa Lopez
  • 61
  • 1
  • 2

2 Answers2

6

To disable multithreads for Tesseract I just added at the beginning of my code

os.environ['OMP_THREAD_LIMIT'] = '1'
OanaM
  • 85
  • 7
  • Hello. Thank you for your reply. I tried but I feel like nothing's change. I have a lot of python files in my project so I don't know if I must put it in the file where there is parallelism or in the configuration file or in another one. – Elisa Lopez Jul 27 '20 at 15:55
  • I think you should add this in the file where you instantiate Tesseract – OanaM Aug 07 '20 at 10:14
  • Does the python environment get pushed into the process that is running Tesseract? I don't know for sure but without the variable being EXPORTed, but instinct says this will not work. – David Medinets Aug 20 '20 at 12:01
2

Write a bash script that starts your python program. In that script, before your program is started run the following command:

export OMP_THREAD_LIMIT=1
David Medinets
  • 5,160
  • 3
  • 29
  • 42