Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions

votes

3 answers

Getting an error when using the image_to_osd method with pytesseract

Here's my code: import pytesseract import cv2 from PIL import Image pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe" def main(): original = cv2.imread('D_Testing.png', 0) # binary thresh it at…

python-3.x ocr tesseract python-tesseract pytesser

asked Jan 04 '19 at 22:38

Bob Stoops

votes

1 answer

Preserving Spaces in Tesseract

I had an image file, which contain some text separated by tabs (2 spaces). But when I extract text out of this image file, I always get a single space between two columns. A sample example: IMAGE: col-a col-b col-c Desired output: col-a …

python python-tesseract

asked Aug 03 '18 at 08:26

raghu

votes

3 answers

error while trying to install tesserocr

I keep getting the same error when I try to install (env) vagrant@vagrant:~$ pip install tesserocr Collecting tesserocr Using cached tesserocr-2.1.3.tar.gz Building wheels for collected packages: tesserocr Running setup.py bdist_wheel for…

python-tesseract

asked Apr 19 '17 at 04:49

KSar

votes

2 answers

Tesseract - unable to recognize Greek letters at all

I am trying to automatically extract a scale (scale bar + a number + unit) from an image. Here is an example: It is used to map pixels to real world measurement. I am using PyTesseract (installed through Anaconda3). Here is my code: import…

python ocr tesseract training-data python-tesseract

asked Nov 01 '20 at 13:56

rbaleksandar

8,713
7
76
161

votes

3 answers

Applying user patterns in pytesseract

I'm using pytesseract to try to detect certain pattern of strings in images. As far as I understand, the correct use of user patterns will help pytesseract make a better scan for a certain pattern of string. However, I can't figure out how to put…

python tesseract python-tesseract

asked Jun 24 '20 at 16:47

aabujamra

4,494
13
51
101

votes

3 answers

How to convert PDF into image readable by opencv-python?

I am using following code to draw rectangle on an image text for matching date pattern and its working fine. import re import cv2 import pytesseract from PIL import Image from pytesseract import Output img = cv2.imread('invoice-sample.jpg') d =…

python python-imaging-library tesseract python-tesseract

asked May 16 '20 at 06:46

P.Natu

votes

2 answers

Does anyone knows the meaning of output of image_to_data, image_to_osd methods of pytesseract?

I'm trying to extract the data from image using pytesseract. This module has image_to_data, image_to_osd methods. These two methods provides lot of info(TextLineOrder, WritingDirection, ScriptDetection, Orientation etc...) as output. Below image is…

python ocr python-tesseract

asked Apr 27 '20 at 14:45

Eswar RDS

votes

2 answers

tesseract 5.0 bazaar + user-words config doesn't work

I tried to force tesseract to use only my words list when perform OCR. First, i copy bazaar file to /usr/share/tesseract-ocr/5/tessdata/configs/. This is my bazaar file: load_system_dawg F load_freq_dawg F user_words_suffix user-words Then, i…

ocr tesseract python-tesseract

asked Dec 12 '19 at 14:51

voxter

votes

2 answers

Why can't get string with PIL and pytesseract?

It is a simple Optical Character Recognition (OCR) program in Python 3 to get string, I have uploaded the target gif file here, please download it and save it as /tmp/target.gif. try: from PIL import Image except ImportError: import…

python python-3.x ocr python-tesseract

asked Jul 24 '19 at 13:22

showkey

votes

1 answer

converting pdf to image but after zooming in

This link shows how pdfs could be converted to images. Is there a way to zoom my pdfs before converting to images? In my project, i am converting pdfs to pngs and then using Python-tesseract library to extract text. I noticed that if I zoom pdfs and…

image pdf ocr python-tesseract poppler

asked Mar 22 '19 at 17:58

user2543622

5,760
25
91
159

votes

1 answer

How to Create Traineddata file For Tesseract 4.1.0

I want to recognise the characters of NumberPlate. How to train the tesseract-ocr for respective number plate in ubuntu 16.04. Since i don't familiar with training. Please help me to create a 'traineddata' file for recognizing numberplate. I have…

ocr tesseract python-tesseract openalpr automatic-license-plate-recognition

asked Mar 07 '19 at 05:26

Grv

votes

1 answer

Empty string with Tesseract

I'm trying to read different cropped images from a big file and I manage to read most of them but there are some of them which return an empty string when I try to read them with tesseract. The code is just this…

python opencv ocr tesseract python-tesseract

asked Dec 15 '18 at 20:47

Alberto Carmona

votes

3 answers

Get orientation pytesseract Python3

I want to get the orientation of a scanned document. I saw this post Pytesseract OCR multiple config options and I tried to use --psm 0 to get the orientation. target = pytesseract.image_to_string(text, lang='eng', boxes=False, \ config='--psm 0…

python tesseract python-tesseract

asked Aug 13 '18 at 13:13

lads

1,125
3
15
29

votes

1 answer

Image Preprocessing for OCR - Tessaract

Obviously this image is pretty tough as it is low clarity and is not a real word. However, with this code, I'm detecting nothing close: import pytesseract from PIL import Image, ImageEnhance, ImageFilter image_name = 'NedNoodleArms.jpg' im =…

python ocr image-recognition image-preprocessing python-tesseract

asked Aug 04 '18 at 19:33

Ashley O

1,130
3
21
34

votes

2 answers

Extract text from image using OCR in python

I want to extract text from a specific area of the image like the name and ID number from identity card. The ID card from which I want to extract text is in the Chinese language(Chinese ID card). I have tried this code but it just extracts the…

python opencv tesseract python-tesseract pytesser

asked Jul 11 '18 at 04:08

Tehseen

Prev 1 2

…

99 100 Next