Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions
6
votes
1 answer

Can not make tesseract work in google app engine with python3

I am trying to deploy an application on the Google App Engine that also has OCR function. I downloaded the tesseract using homebrew and using pytesseract to wrap in Python. The OCR function works on my local system, but it does not when I upload the…
6
votes
1 answer

How to process and extract text from image

I'm trying to extract text from image using python cv2. The result is pathetic and I can't figure out a way to improve my code. I believe the image needs to be processed before the extraction of text but not sure how. I've tried to convert it into…
idar
  • 614
  • 6
  • 13
6
votes
1 answer

How to set config load_system_dawg when using pytesseract to improve result?

I am trying to improve the result by changing params using pytesseract config. I am wondering if there is a possibility to change load_system_dawg and load_freq_dawg as specified in…
Robin White
  • 159
  • 2
  • 11
6
votes
3 answers

How to extract dotted text from image?

I'm working on my bachelor's degree final project and I want to create an OCR for bottle inspection with python. I need some help with text recognition from the image. Do I need to apply the cv2 operations in a better way, train tesseract or should…
6
votes
2 answers

How to make bounding box around text-areas in an image? (Even if text is skewed!!)

I am trying to detect and grab text from a screenshot taken from any consumer product's ad. My code works at a certain accuracy but fails to make bounding boxes around the skewed text area. Recently I tried Google Vision API and it makes bounding…
6
votes
2 answers

Why does tesseract fail to read text off this simple image?

I have read mountains of posts on pytesseract, but I cannot get it to read text off a dead simple image; It returns an empty string. Here is the image: I have tried scaling it, grayscaling it, and adjusting the contrast, thresholding, blurring,…
hegash
  • 833
  • 1
  • 7
  • 16
6
votes
1 answer

Tesseract 3.x multiprocessing weird behaviour

I am not sure whether it is my infrastucture that does this weird stuff or the tesseract-ocr itself. Whenever i use image_to_stirng in single-process environment - the tesseract-ocr works fine. But when I spawn multiple workers with gunicorn and…
Laimonas Sutkus
  • 3,247
  • 2
  • 26
  • 47
6
votes
1 answer

How to get better/accurate results with OCR from low resolution images

I've written a script in python using pytesseract to get the text embedded in an image. When I run my script, the scraper does it's job weirdly, meaning the text I get as result is quite different from what is in the image. Script I've tried…
SIM
  • 21,997
  • 5
  • 37
  • 109
6
votes
1 answer

pip install tesserocr fails with error " Failed building wheel for tesserocr"

I already have the latest builds for leptonica and tesseract tesseract 4.00.00alpha-365-gcf0b378 leptonica-1.74.1 libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 i have also installed all dependencies like…
ajack13
  • 627
  • 1
  • 7
  • 10
6
votes
1 answer

get Font Size in Python with Tesseract and Pyocr

Is it possible to get font size from an image using pyocr or Tesseract? Below is my code. tools = pyocr.get_available_tools() tool = tools[0] txt = tool.image_to_string( Imagee.open(io.BytesIO(req_image)), lang=lang, …
Witcher
  • 63
  • 1
  • 1
  • 5
6
votes
2 answers

Image to text recognition using Tesseract-OCR is better when Image is preprocessed manually using Gimp than my Python Code

I am trying to write code in Python for the manual Image preprocessing and recognition using Tesseract-OCR. Manual process: For manually recognizing text for a single Image, I preprocess the Image using Gimp and create a TIF image. Then I feed it to…
Hussain
  • 5,057
  • 6
  • 45
  • 71
6
votes
2 answers

Tesseract quiet mode

Under Ubuntu I use tesseract-ocr in version 3.02. Especially the wrapper pytesseract for python, but this question is also about the commandline-tool. In the FAQ…
Texmex
  • 63
  • 1
  • 6
5
votes
1 answer

How do I add tesseract to my Docker container so i can use pytesseract

I am working on a project that requires me to run pytesseract on a docker container, but am unable to install tesseract onto the container, I also don't know what the file path for pytesseract should be My Dockerfile: FROM python:3 ENV…
s_h
  • 51
  • 1
  • 2
5
votes
1 answer

How to install Tesseract OCR on Databricks

I am trying to run the following script on a databrick python notebook: pip install presidio-image-redactor pip install pytesseract python -m spacy download en_core_web_lg from PIL import Image from presidio_image_redactor import…
5
votes
0 answers

How can I fine tune tesseract on custom dataset?

I know this question may not be a new one, but training/fine-tuning tesseract is one of the hardest part, I could never find any articles which can explain it properly. All the tutorials or docs no one explained it completely, going through them…
user_12
  • 1,778
  • 7
  • 31
  • 72