Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions
0
votes
1 answer

Segmenting image files with text (and pictures) into blocks

I'm trying to create bounding boxes for the text in an image I have. An example is the one below. I would like to add a bounding box around each This is a test line. Unfortunately I'm not sure why this method is not automatically identifying the…
Black
  • 4,483
  • 8
  • 38
  • 55
0
votes
1 answer

Extract Data from an Image with Python/OpenCV/Tesseract?

I'm trying to extract some contents from a cropped image. I tried pytesseract and opencv template matching but the results are very poor. OpenCV template matching sometimes fails due to poor quality of the icons and tesseract gives me a line of text…
roosevelt
  • 1,874
  • 4
  • 20
  • 27
0
votes
0 answers

How do I measure text size to extract only certain letters from an image in OpenCV?

I am looking to extract text from a license plate. For now I have been using pytesseract with opencv to zero in on the relevant contours and pull out text. This works decently for non-American plates, but I am curious about applying this to American…
LoF10
  • 1,907
  • 1
  • 23
  • 64
0
votes
1 answer

Access denied, Permission error in Windows [win error 5]

Whenever I run this code this error occurs. I tried Python -m pip install – user pyforms too but nothing worked for me I am using Anaconda-> Jupyter Notebook, Python 3.7, I have imported pytesseract already Please explain if you can what actually…
0
votes
1 answer

Pytesseract with Pyspark throws Error :- pytesseract module not found

I am trying to write OCR code using spark and pytesseract and I am running into pytesseract module not found error even though pytesseract module is installed. import pytesseract from PIL import Image path='/XXXX/JupyterLab/notebooks/testdir' rdd…
user2844511
  • 420
  • 5
  • 9
0
votes
1 answer

Converting image identified by PyTesseract to an array

I have an image with a list of numbers which I have scanned using PyTesseract to construct a string. Concretely, here is the code: from PIL import Image import pytesseract from scipy import stats import numpy as…
0
votes
0 answers

Moroccan License Plate Recognition (LPR) using OPENALPR, OpenCV and Tesseract

I work currently on my project of end of study: the title of my project is "the detection in real time of the Moroccan license plates of vehicles ( ALPR )", I tried to post this my issue in group ALPR unfortunately ,i received no reply . i…
0
votes
1 answer

Error when trying to use custom tessdata file

I have generated a box file from a png image then I followed this tutorial: https://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/ to generate custom traineddata file. I encountered an error when I…
Andrew
  • 1,507
  • 1
  • 22
  • 42
0
votes
1 answer

Improve Pytesseract reliability of reading text

I'm trying to read relatively clear numbers from a screenshot, but I am running into issues getting pytesseract to read the text correctly. I have the following screenshot: And I know the score (2-0) and the clock (1:42) are going to be in the…
0
votes
0 answers

Pytesseract Image OCR - digits not recogized

I have this image. Running Pytesseract with python 3.8 produced follwoing problem: The word "phone" is read as O (not zero, O as in oscar) The word "Fax" is read as 2%. The phone number is read as (56031770 The image in consideration does not…
Sean
  • 789
  • 6
  • 26
0
votes
1 answer

Tesseract Not Found in Terminal

I have a problem with tesseract (I'm working with Pycharm). When I run the script everything is ok. But when I define a method and run it in the terminal, it raises an error: TesseractNotFoundError:…
Acil
  • 3
  • 3
0
votes
0 answers

Tesseract force space and character replacement

I am doing the tesseract conversion on images file. Here is the code used to extract text from images : #Extract text from Image im = Image.open(r"C:\Users\XXXXX") text = pytesseract.image_to_string(im, lang = 'eng',config='--psm 1 --oem…
0
votes
0 answers

How to improve the result of pytesseract?

I am applying pytesseract to my project and I did not get the desired results, so I started to optimize a bit ... I trained the font from the website I made the image binary (Black and white) I put only the characters that will have the images (A…
0
votes
0 answers

tesserocr get per-word orientation

I'm using the tesserocr library to perform OCR on an image. Some of the words in the image are vertical, and some are horizontal. Is there any way to tell the orientation of the words in the image on a per-word basis from this library? I want…
rma
  • 1,853
  • 1
  • 22
  • 42
0
votes
1 answer

Having trouble with creating env variables that take affect while running my server

this is my first question here, so although I'll try my best to ask the question correctly, please have patience with me. I'm trying to run an OCR with Tesseract with Django on my server at some server (pythonanywhere, if it's important in any way),…
Orikle
  • 46
  • 3