Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions

votes

1 answer

How to read numbers on screen efficiently (pytesseract)?

I'm trying to read numbers on the screen and for that I'm using pytesseract. The thing is, even though it works, it works slowly and doesn't give good results at all. for example, with this image: I can make this thresholded image: and it reads…

python automation python-tesseract

asked Jan 22 '23 at 19:01

the shadow

votes

0 answers

Extract Key-Value Pair Using OCR

I am trying to extract key value pairs eg :- (securities-stock : 0.00) using pytesseract from image of personal finance statement. for eg But so far I am not able to get that, How should I rectify my approach so as to extract key value pair from…

python ocr python-tesseract

asked Oct 28 '22 at 10:57

reven06

votes

2 answers

Deskewing an image with background (Python)

I am working on a project where I am doing OCR on text on a label. My job is to deskew the image to make it readable with tesseract. I have been using this approach, that greyscales and thresholds the picture, gets the coordinates of the black…

python opencv ocr tesseract python-tesseract

asked Jan 12 '22 at 18:38

Imogenio

votes

1 answer

read single page .tif files as multipage.tiff from filename

UPDATE: I found out it is unreasonable to create pdf files from OCRed files So it would be better to leave it as is without conversion. I still have the problem that some images are connected while others are 1 pagers. data = [] listOfPages =…

python tesseract filenames tiff python-tesseract

asked Oct 06 '21 at 10:45

id345678

votes

0 answers

Config for pytesseract (Urdu language)

I am having some problems with pytesseract. With this line of code pytesseract works poorly with Urdu language: text = pytesseract.image_to_string(img, lang="urd") What configuration should I use to improve the accuracy for Urdu language? And what…

nlp ocr tesseract python-tesseract urdu

asked Aug 08 '21 at 21:03

Samee Arif

votes

1 answer

Pytesseract Failed loading language 'chi-sim'

I am working on python tesseract package with sample code like the follows: import pytesseract from PIL import Image tessdata_dir_config = "--tessdata-dir \"/opt/homebrew/Cellar/tesseract-lang/4.1.0/share/tessdata/\"" image =…

python macos tesseract python-tesseract

asked Jul 17 '21 at 13:05

Anemonee

votes

2 answers

Error: tesseract is not installed or it's not in your PATH

I am new to pytesseract and OCR and I searched on the internet that this are the tools that is used to extract text from images. But, I have no prior knowledge of this tool. Right now, I am having this error: tesseract is not installed or it's not…

python tesseract python-tesseract

asked Jun 22 '21 at 13:01

SmitShah_19

votes

1 answer

How to improve the OCR accuracy in this image?

I am going to extract text from a picture using OpenCV in Python and OCR by pytesseract. I have an image like this: Then I have written some code to extract the text from that picture, nut it does not have enough accuracy to extract the text…

python opencv image-processing ocr python-tesseract

asked May 02 '21 at 20:45

FATEGH

votes

0 answers

Tesseract Not Found error occuring in IDLE but not in Terminal

I am working on some OCR and chose to use tesseract as the library. So, I installed it using the pip command in the terminal and when I tested the library with a sample image it seems to be working fine(in the terminal). I have no idea why it…

python ocr tesseract python-tesseract

asked May 01 '21 at 15:09

wierd23

votes

1 answer

Why pytesseract can't handle OSD mode?

I cant run OSD mode in pytesseract on docker image on Ubuntu. On windows, this command works like charm: pytesseract.image_to_osd(image) But inside docker image, it causes the following error. What I want to achieve is reading the rotation info…

python ocr tesseract python-tesseract

asked Apr 09 '21 at 09:44

troger19

1,159
2
12
29

votes

1 answer

Improving accuracy in Python Tesseract OCR

I am using pytesseract along with openCV in a simple django application in Python to extract text in Bengali language from image files. I have a form that lets you upload an image and on clicking the submit button sends it to the server side in an…

django opencv ocr tesseract python-tesseract

asked Apr 04 '21 at 23:50

Istiaque Ahmed

6,072
24
75
141

votes

1 answer

pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information

import pytesseract from PIL import Image img = Image.open('image1.jpg') result = pytesseract.image_to_string(img) print(result) My question is may similar to this and this. But, there's no helpful answer for me... Error : Traceback (most recent…

python error-handling python-tesseract

asked Mar 16 '21 at 16:17

user15336431

votes

2 answers

Pytessaract image_to_pdf_or_hocr function not working in AWS lambda

I am using this repository to deploy tesseract as a lambda layer: https://github.com/bweigel/aws-lambda-tesseract-layer The deployment works well and other functions that pytesseract has like: image_to_string, image_to_data also works well without…

python aws-lambda tesseract python-tesseract

asked Mar 15 '21 at 06:39

Pramesh Bajracharya

2,153
3
28
54

votes

2 answers

Pytesseract doesn't recognize decimal points

I'm trying to read the text in this image that contains also decimal points and decimal numbers in this way: img = cv2.imread(path_to_image) print(pytesseract.image_to_string(img)) and what I get is: 73-82 Primo: 50 — I've tried to specify also…

python opencv ocr tesseract python-tesseract

asked Mar 06 '21 at 11:58

marco

votes

1 answer

Tesseract installed via home-brew, Anaconda says no module 'pytesseract?

Sorry, complete newbie question here.....I installed tesseract, tesseract-lang both via homebrew, also via the terminal (using conda install https://anaconda.org/conda-forge/tesseract ). In the terminal it looks like it is installed as I get this…

python anaconda tesseract python-tesseract

asked Mar 03 '21 at 16:18

T-RevLey

Prev 1 2 3

…

99 100 Next