I'm using Python 3.6 in Windows 10 and have Pytesseract already installed but I found in a code Tesserocr which by the way I can't install. What is the difference?
-
In addition to [this answer](https://stackoverflow.com/a/56387215/11630056) in Tesserocr there is no support for Python 3.8 (April 2020). – Gustaw Solski Apr 01 '20 at 12:58
3 Answers
From my experience Tesserocr is much faster than Pytesseract.
Tesserocr is a python wrapper around the Tesseract C++ API. Whereas pytesseract
is a wrapper around the tesseract-ocr
CLI.
With Tesserocr you can pre-load the model at the beginning or your program (which is called memoization), and run the model separately (for example in loops to process videos).
With pytesseract
, each time you call image_to_string
function, it loads the model and process the image, which makes it slower for repeated calls.
To install tesserocr
I just typed in the terminal pip install tesserocr
.
To use tesserocr
import tesserocr
from PIL import Image
api = tesserocr.PyTessBaseAPI()
pil_image = Image.open('sample.jpg')
api.SetImage(pil_image)
text = api.GetUTF8Text()
To install pytesseract : pip install pytesseract
.
To run it :
import pytesseract
import cv2
image = cv2.imread('sample.jpg')
text = pytesseract.image_to_string(image)

- 4,799
- 3
- 38
- 59

- 413
- 4
- 6
-
1
-
`cv2 has no imread member`, but I get info when hovering over imread() in VSCode. Maybe cv2 is not installed, but I `pip`ed it and python -c 'import cv2' shows no error, just nothing. – Timo Nov 11 '20 at 19:52
-
another speed impact is that pytesseract (as of the time writing this comment) always writes images to disk instead of directly piping to tesseract, see https://github.com/madmaze/pytesseract/issues/172 – j-hap Mar 12 '21 at 07:34
Pytesseract is a python "wrapper" for the tesseract binary. It offers only the following functions, along with specifying flags (man page):
get_tesseract_version
Returns the Tesseract version installed in the system.image_to_string
Returns the result of a Tesseract OCR run on the image to stringimage_to_boxes
Returns result containing recognized characters and their box boundariesimage_to_data
Returns result containing box boundaries, confidences, and other information. Requires Tesseract 3.05+. For more information, please check the Tesseract TSV documentationimage_to_osd
Returns result containing information about orientation and script detection.
See the project description for more information.
On the other hand, tesserocr interfaces directly with Tesseract's C++ API (APIExample) which is much more flexible/complex and offers advanced features.

- 9,525
- 5
- 58
- 102
pytesseract
is only a binding for tesseract-ocr
for Python. So, if you want to use tesseract-ocr
in python code without using subprocess
or os
module for running command line tesseract-ocr
commands, then you use pytesseract
. But, in order to use it, you have to have a tesseract-ocr
installed.
You can think of it this way. You need a tesseract-ocr
installed because it's the program that actually runs and does the OCR. But, if you want to run it from python code as a function, you install pytesseract
package that enables you to do that. So when you run pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra')
, it calls the tesseract-ocr
with the provided arguments. The results are the same as running tesseract test-european.jpg -l fra
. So, you get the ability to call that from the code, but in the end, it still has to run the tesseract-ocr
to do the actual OCR.

- 2,143
- 1
- 12
- 22
-
Thanks a lot, now I understand... Do you have any idea on how to install tesserocr? If you have it installed what are the steps you followed and what version of Visual Studio you are using. Thank you again! – Soufiane S Feb 19 '19 at 09:25
-
I have already installed tesseract for Windows, I need to install [tesserocr](https://pypi.org/project/tesserocr/) for python but it fails... – Soufiane S Feb 19 '19 at 09:36
-
1Then download desired version from [here](https://github.com/simonflueckiger/tesserocr-windows_build/releases) and the just run `pip install
.whl` – Novak Feb 19 '19 at 09:38 -
6This does not answer what is tesserocr, which is different from tesseract-ocr, as explained in https://stackoverflow.com/a/56387215/4974791 – Guillermo González de Garibay Apr 22 '20 at 11:22