Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions
3
votes
1 answer

Python text extraction from a video game screenshot

I am building a discord bot with discord.py for the video game Diablo 2. One of the functionalities requires the bot to extract the name and properties of items from Diablo 2 screenshots. I am currently using pytesseract for this but I am not…
mostsignificant
  • 312
  • 1
  • 8
3
votes
3 answers

How to fix problem of "ModuleNotFoundError: No module named 'PIL'"?

I tried with the solution given in 'stackoverflow', but not resolved. I am trying to extract text from images with the help of pytesseract module from python. The following are the steps I followed: code: py -m pip install --user virtualenv py -m…
krishna
  • 401
  • 6
  • 16
3
votes
1 answer

Unix terminal screenshot to text

Having thought this might be a fairly easy task, I wanted to take a screenshot of a unix terminal and convert it into text or as close to as if i had copied that text from the terminal. I have been digging around and the common choice for image to…
Chris Doyle
  • 10,703
  • 2
  • 23
  • 42
3
votes
1 answer

How to remove horizontal and vertical lines without degrading the image quality in python

I am trying to remove horizontal and vertical lines from a image. This image is generated from a pdf using pdf2jpg library. Upon removal of the horizontal and vertical lines this image will be fed to pytesseract to extract words and their individual…
Roy
  • 344
  • 2
  • 12
3
votes
1 answer

Use Tesseract OCR to extract text from a scanned pdf folders

I have the code to extract/convert text from scanned pdf files/normal pdf files by using Tesseract OCR. But I want to make my code to convert a pdf folder rather than a single pdf file, then the extract text files will be store in a folder that I…
CodingStark
  • 199
  • 3
  • 17
3
votes
0 answers

tesseract return reversed words with arabic

hello everyone I'm trying to extract a license number plate from Tunisian cars so i decided to use tesseract to extract the numbers and word 'تونس' so before that i installed tesseract-OCR v5.0.0 for windows 10 and i wanted to try on an…
3
votes
0 answers

pytesseract image_to_pdf_or_hocr output pdf and also text

Is there a way to make pytesseract.image_to_pdf_or_hocr output both pdf and text data? Currently I am doing like this: pdf = pytesseract.image_to_pdf_or_hocr(fp.name, extension='pdf') text = pytesseract.image_to_string(fp.name) is there a way to do…
Baconator507
  • 1,747
  • 2
  • 12
  • 20
3
votes
1 answer

Pytesseract - OCR on image with text in different colors

Pytesseract is unable to extract text when texts are present in different colors . I tried using opencv to invert the image but it doesn't work for dark text colors. The image: import cv2 import pytesseract from PIL import Image def…
Abhi
  • 442
  • 1
  • 10
  • 24
3
votes
0 answers

Pytesseract OCR for single character on images

I am trying to read text from the image , the image consist of single character it is not reading correctly. this is the type of images i have it is reading this image as 't' most of them it is reading incorrectly . these are some of my images…
hotshot code
  • 173
  • 1
  • 10
3
votes
0 answers

Python speed up Pytesseract / Use native tesseract library in Python

I use the import cv2,pyautogui,numpy as np img=np.array(pyautogui.screenshot()) pytesseract.image_to_string(img, lang='eng') command to get the python wrapper for tesseract to get text from an image for me, which goes through the cli interface…
azazelspeaks
  • 5,727
  • 2
  • 22
  • 39
3
votes
0 answers

OCR Tesseract - Get Image Font Attributes

I have been using Pytesseract to extract text from image. I am currently in a restoration task of an image document. Aside from extracting text from an image, I also wanted to identify each words font, font size, whether the character is capital or…
alyssaeliyah
  • 2,214
  • 6
  • 33
  • 80
3
votes
1 answer

Recognize specific numbers from table image with Pytesseract OCR

I want to read a column of number from an attached image (png file). My code is import cv2 import pytesseract import os img = cv2.imread(os.path.join(image_path, image_name), 0) config= "-c …
3
votes
0 answers

PyTesseract causes PIL to raise ValueError: tile cannot extend outside image

I'm working on a program that splits a picture into a bunch of different parts, then converts each part to a string using pytesseract. The problem is that PIL, which is used in Pytesseract, keeps raising ValueError: tile cannot extend outside image.…
3
votes
2 answers

raise TesseractError(proc.returncode, get_errors(error_string))

I am trying to extact text from an image using the pytesseract module in Python but I keep getting an error when I execute my code below. There is a similar question that someone provided with this answer…
zlk2000
  • 53
  • 1
  • 7
3
votes
1 answer

How to read digits from an image with PyTesseract OCR?

I'm trying to get PyTesseract OCR to read digits from this simple and well cropped Image, but for some reason it's just not able to do this. from PIL import Image import pytesseract as p def obtain_balance(a): im = Image.open(a) …
THE YOGOVO
  • 119
  • 1
  • 8