Questions tagged [pytesser]

PyTesser is an Optical Character Recognition module for Python. It takes as input an image or image file and outputs a string.

PyTesser is an Optical Character Recognition module for Python. It takes as input an image or image file and outputs a string.

PyTesser uses the Tesseract OCR engine, converting images to an accepted format and calling the Tesseract executable as an external script. A Windows executable is provided along with the Python scripts. The scripts should work in other operating systems as well.

http://code.google.com/p/pytesser/

105 questions
4
votes
1 answer

pytesseract tessedit_char_whitelist not accepting quote

I have started working with pytesserract in python. When i pass it single or double quote in from PIL import Image import pytesseract import numpy as np tesseract_config = r"""-c…
Mixony
  • 63
  • 1
  • 7
3
votes
1 answer

Convert pytesseract string output to pandas df

I have been given receipts from Subway detailing sales, workers, etc throughout the day and need to extract the data for a management class. I took pictures of the receipts and processed them with pytesseract into a string separated by \n but now…
N.Fisher
  • 154
  • 1
  • 9
3
votes
1 answer

Pytesseract, trying to detect text from on screen

I'm using MSS in conjunction with pytesseract to try and read on-screen to determine a string of characters from the region being monitored. My code is as follows: import Image import pytesseract import cv2 import os import mss import numpy as…
Justin
  • 125
  • 1
  • 2
  • 12
3
votes
2 answers

unable to find tessdata for Tesseract

Hi I am new to python and tesseract. I am using anaconda distribution and trying to use pytesseract-ocr when I try to get the data from image it gives me following error: tesseract imageSample1.jpg test.txt digits // output Tesseract Open Source…
nilkash
  • 7,408
  • 32
  • 99
  • 176
3
votes
1 answer

How To Install Pytesseract in windows 8.1(win64) (visual studio 2012+python+anaconda)

from PIL import Image from tesseract import image_to_string print image_to_string(Image.open('C:\Users/Uzel/Desktop/pythonfoto/denklem.png')) print image_to_string(Image.open('C:\Users/Uzel/Desktop/pythonfoto/denklem.png'), lang='eng') I use this…
3
votes
1 answer

How to improve OCR of text written on vehicles?

I am trying to do OCR of vehicles such as trains or trucks to identify the numbers and characters written on them. (Please note this is not license plate identification OCR) I took this image. The idea is to be able to extract the text - BN SF 721…
Piyush
  • 606
  • 4
  • 16
  • 38
3
votes
1 answer

Using PyTesser to break easy captcha

I am using PyTesser to break a captcha. PyTesser uses tesseract python ocr library. Before putting image to PyTesser, I use some filtering. Step by step my code: input image is: from PIL import Image img = Image.open('1.gif') img =…
Moshi
  • 1,385
  • 2
  • 17
  • 36
3
votes
1 answer

Recognize simple digits with pytesser

I'm learning OCR using PyTesser and Tesseract. As the first milestone, I want to write a tool to recognize captcha that simply consists of some digits. I read some tutorials and wrote such a test program. from pytesser.pytesser import * from PIL…
stanleyxu2005
  • 8,081
  • 14
  • 59
  • 94
2
votes
0 answers

PyTesseract FileNotFoundError: [Errno 2] No such file or directory: 'D:\\...........\\AppData\\Local\\Temp\\tess_467d8rol.osd'

Using PyTesseract method pytesseract.image_to_osd I find the error quoted before: FileNotFoundError: [Errno 2] No such file or directory: 'D:\...........\AppData\Local\Temp\tess_467d8rol.osd' This is the code I am using: # import the necessary…
willyro93
  • 143
  • 1
  • 10
2
votes
1 answer

NameError: name 'pytesseract' is not defined

Pytesseract is not recognized. I have tried all fixes documented online, including adding Tesseract-OCR to my Path variables, incorporating the pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' command path in…
Abagnale
  • 67
  • 1
  • 10
2
votes
0 answers

Python pytesseract incorrect text extraction from image

I'm trying to extract the text from image using pytesseract. But I'm getting incorrect text. suppose when I try to read the following image, A = pytesseract.image_to_string(Image.open('A.png'),config='-psm 6',lang = 'eng') output is: Shoofing I…
2
votes
0 answers

Resize screenshot with mss for better reading with pytesseract

I need to resize an screenshot taken by mss in order to get better reading by pytesseract and i get it done with pil+pyscreenshot but can't get it to with mss. from numpy import array, flip from mss import mss from pytesseract import…
Siewdass Sf
  • 169
  • 1
  • 10
2
votes
2 answers

pytesseract results different from tesseract command line results

I am trying to convert a scanned page to text using both pytesseract and tesseract command line on Ubuntu. The results are remarkably different (pytesseract performs way better than tesseract command line) and I am unable to understand why. I…
2
votes
3 answers

pytesseract and image.tif file

I need to transcribe an image.tif with several pages to text using pytesseract. I have the next code: > From PIL import Image > Import pytesseract > Pytesseract.pytesseract.tesseract_cmd = 'C: / Program Files (x86) / Tesseract- > OCR / tesseract ' >…
Andrés
  • 21
  • 1
  • 3
2
votes
1 answer

Get text from image

I need to use pytesseract to extract text from this picture: enter image description here However, i used pytesseract. It wont work.Here is my code: try: import Image except ImportError: from PIL import Image import…
GMB
  • 21
  • 1
  • 3