7

I am doing OCR using Tesseract on a quad-core processor. For better speed, I want to read 4 words at a time, using 4 threads. Is it safe to call Tesseract from multiple threads concurrently?

Note: each thread will be working on a different, non-shared image.

Note: guarding with locks is not ok because of speed.

Hristo Hristov
  • 4,021
  • 4
  • 25
  • 37
  • 1
    Code can be thread-safe without being reentrant. It sounds like you want thread-safety, not necessarily reentrancy. – Marcelo Cantos Jan 28 '11 at 12:02
  • Yes, it can be made thread-safe by using locks, but I need it to be reentrant, because of speed. The code should execute in parallel. – Hristo Hristov Jan 28 '11 at 12:16

2 Answers2

7

From the release notes, Tesseract is (mostly, and to the degree that you describe needing) thread-safe as of 3.01 (Oct 21 2011)

Thread-safety! Moved all critical globals and statics to members of the appropriate class. Tesseract is now thread-safe (multiple instances can be used in parallel in multiple threads.) with the minor exception that some control parameters are still global and affect all threads.

I've been successfully using it on multiple cores for that long (or longer, from dev branch).

Kaolin Fire
  • 2,521
  • 28
  • 43
3

I don't think tesseract is currently parallelizable (see this thread), although one of the main goals for v3.0 is to make it more thread-safe.

However, you could always parallelize by running n concurrent processes of tesseract. If you want to parallelize the OCRing of a single image, it would be up to you to split it and feed each part to each of these n processes (basically a mapreduce).

Mauricio Scheffer
  • 98,863
  • 23
  • 192
  • 275
  • Parallelization with processes will be much harder... I have a process that generates many different images at a time. Feeding these images to the tesseract processes is possible and will do the trick, but I will need a way to talk to the processes and to dispatch and collect the input/output. – Hristo Hristov Jan 31 '11 at 07:17
  • @Hristo: I had exactly the same problem with GeckoFX and solved it with TPL + proxies: http://bugsquash.blogspot.com/2010/03/proxying-and-parallelizing-processes.html – Mauricio Scheffer Jan 31 '11 at 12:45