Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions
3
votes
1 answer

How to read numbers on screen efficiently (pytesseract)?

I'm trying to read numbers on the screen and for that I'm using pytesseract. The thing is, even though it works, it works slowly and doesn't give good results at all. for example, with this image: I can make this thresholded image: and it reads…
the shadow
  • 65
  • 7
3
votes
0 answers

Extract Key-Value Pair Using OCR

I am trying to extract key value pairs eg :- (securities-stock : 0.00) using pytesseract from image of personal finance statement. for eg But so far I am not able to get that, How should I rectify my approach so as to extract key value pair from…
reven06
  • 31
  • 3
3
votes
2 answers

Deskewing an image with background (Python)

I am working on a project where I am doing OCR on text on a label. My job is to deskew the image to make it readable with tesseract. I have been using this approach, that greyscales and thresholds the picture, gets the coordinates of the black…
Imogenio
  • 31
  • 3
3
votes
1 answer

read single page .tif files as multipage.tiff from filename

UPDATE: I found out it is unreasonable to create pdf files from OCRed files So it would be better to leave it as is without conversion. I still have the problem that some images are connected while others are 1 pagers. data = [] listOfPages =…
id345678
  • 97
  • 1
  • 3
  • 21
3
votes
0 answers

Config for pytesseract (Urdu language)

I am having some problems with pytesseract. With this line of code pytesseract works poorly with Urdu language: text = pytesseract.image_to_string(img, lang="urd") What configuration should I use to improve the accuracy for Urdu language? And what…
Samee Arif
  • 61
  • 3
3
votes
1 answer

Pytesseract Failed loading language 'chi-sim'

I am working on python tesseract package with sample code like the follows: import pytesseract from PIL import Image tessdata_dir_config = "--tessdata-dir \"/opt/homebrew/Cellar/tesseract-lang/4.1.0/share/tessdata/\"" image =…
Anemonee
  • 33
  • 6
3
votes
2 answers

Error: tesseract is not installed or it's not in your PATH

I am new to pytesseract and OCR and I searched on the internet that this are the tools that is used to extract text from images. But, I have no prior knowledge of this tool. Right now, I am having this error: tesseract is not installed or it's not…
SmitShah_19
  • 117
  • 1
  • 1
  • 8
3
votes
1 answer

How to improve the OCR accuracy in this image?

I am going to extract text from a picture using OpenCV in Python and OCR by pytesseract. I have an image like this: Then I have written some code to extract the text from that picture, nut it does not have enough accuracy to extract the text…
FATEGH
  • 51
  • 1
  • 6
3
votes
0 answers

Tesseract Not Found error occuring in IDLE but not in Terminal

I am working on some OCR and chose to use tesseract as the library. So, I installed it using the pip command in the terminal and when I tested the library with a sample image it seems to be working fine(in the terminal). I have no idea why it…
wierd23
  • 41
  • 3
3
votes
1 answer

Why pytesseract can't handle OSD mode?

I cant run OSD mode in pytesseract on docker image on Ubuntu. On windows, this command works like charm: pytesseract.image_to_osd(image) But inside docker image, it causes the following error. What I want to achieve is reading the rotation info…
troger19
  • 1,159
  • 2
  • 12
  • 29
3
votes
1 answer

Improving accuracy in Python Tesseract OCR

I am using pytesseract along with openCV in a simple django application in Python to extract text in Bengali language from image files. I have a form that lets you upload an image and on clicking the submit button sends it to the server side in an…
Istiaque Ahmed
  • 6,072
  • 24
  • 75
  • 141
3
votes
1 answer

pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information

import pytesseract from PIL import Image img = Image.open('image1.jpg') result = pytesseract.image_to_string(img) print(result) My question is may similar to this and this. But, there's no helpful answer for me... Error : Traceback (most recent…
user15336431
3
votes
2 answers

Pytessaract image_to_pdf_or_hocr function not working in AWS lambda

I am using this repository to deploy tesseract as a lambda layer: https://github.com/bweigel/aws-lambda-tesseract-layer The deployment works well and other functions that pytesseract has like: image_to_string, image_to_data also works well without…
Pramesh Bajracharya
  • 2,153
  • 3
  • 28
  • 54
3
votes
2 answers

Pytesseract doesn't recognize decimal points

I'm trying to read the text in this image that contains also decimal points and decimal numbers in this way: img = cv2.imread(path_to_image) print(pytesseract.image_to_string(img)) and what I get is: 73-82 Primo: 50 — I've tried to specify also…
marco
  • 525
  • 4
  • 11
3
votes
1 answer

Tesseract installed via home-brew, Anaconda says no module 'pytesseract?

Sorry, complete newbie question here.....I installed tesseract, tesseract-lang both via homebrew, also via the terminal (using conda install https://anaconda.org/conda-forge/tesseract ). In the terminal it looks like it is installed as I get this…
T-RevLey
  • 31
  • 2