6

My objective is to use OCR in Python 2.7 using Tesseract on a Windows 7 machine, but I am running into issues as for the installation process. I tried following the instruction here but the link to "tesseract-core-yyyymmdd.exe" and "tesseract-langs-yyyymmdd.exe" do not exist anymore and I can't find these .exe elsewhere online. Here's what I have done so far:

  1. installed tesseract from its executable from official tesseract-ocr page.
  2. installed via pip packages "wand", "PIL", "pyocr".

Now, if I do the following in Python:

from wand.image import Image from PIL import Image as PI import pyocr import pyocr.builders import io

No problem loading up these packages but pyocr.get_available_tools() gives me an empty list. I am sure this has to do with the missing installation .exe files above. Where can I find them? Is it something else that I am missing?

Community
  • 1
  • 1
Plug4
  • 3,838
  • 9
  • 51
  • 79

4 Answers4

4

I just tried to set up pytesseract and it works ! I have windows 10 and python 2.7 installed.

all you need to do :

  1. Download Visual basic C++ from http://aka.ms/vcpython27 and install it (common installation step)
  2. Download tesseract from python via this link https://pypi.python.org/pypi/pytesseract

  3. Unizip the file.

  4. Go to the directory which contains the unizip file

  5. Run this command " python setup.py install "

  6. (Additional) to test if it's installed, go to your python shell and run this command " import pytesseract "

I hope it works !! Note pytesseract is google based OCR, it works similarly to tesseract.

3

Step [1] To install tesseract kindly visit

https://github.com/UB-Mannheim/tesseract/wiki

The latest installers can be downloaded from here: e.g., tesseract-ocr-setup-3.05.02-20180621.exe, tesseract-ocr-w32-setup-v4.0.0-beta.1.20180608.exe, tesseract-ocr-w64-setup-v4.0.0-beta.1.20180608.exe (64 bit)

Step [2] Download Microsoft Visual C++ Compiler for Python 2.7 from the link given below https://download.microsoft.com/download/7/9/6/796EF2E4-801B-4FC4-AB28-B59FBF6D907B/VCForPython27.msi

Step [3] Install pytesseract for binding for tesseract using pip

pip install pytesseract

Step [4] Furthermore you can install an image processing library in python, e.g., pillow:

pip install pillow

greetings!! you are done!! :)

Shashank Singh
  • 719
  • 6
  • 11
1

PIP is a package manager for Python packages

  1. Open cmd run pip search "pytesseract", you can see latest version
  2. Run pip install pytesseract for latest version or pip install pytesseract==0.3.0 for version you want.
  3. In windows python cmd run import pytesseract for sure installed was successful.
Shurima
  • 11
  • 1
0

Install both and you are done

Binaries from: https://github.com/UB-Mannheim/tesseract/wiki

Python Wrapper from here: https://pypi.python.org/pypi/pytesseract

Abhishek
  • 3,337
  • 4
  • 32
  • 51