Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions

votes

1 answer

Python text extraction from a video game screenshot

I am building a discord bot with discord.py for the video game Diablo 2. One of the functionalities requires the bot to extract the name and properties of items from Diablo 2 screenshots. I am currently using pytesseract for this but I am not…

python opencv ocr screenshot python-tesseract

asked Jan 31 '21 at 13:50

mostsignificant

votes

3 answers

How to fix problem of "ModuleNotFoundError: No module named 'PIL'"?

I tried with the solution given in 'stackoverflow', but not resolved. I am trying to extract text from images with the help of pytesseract module from python. The following are the steps I followed: code: py -m pip install --user virtualenv py -m…

python python-imaging-library python-tesseract

asked Jan 12 '21 at 11:36

krishna

votes

1 answer

Unix terminal screenshot to text

Having thought this might be a fairly easy task, I wanted to take a screenshot of a unix terminal and convert it into text or as close to as if i had copied that text from the terminal. I have been digging around and the common choice for image to…

python opencv python-tesseract

asked Dec 29 '20 at 00:54

Chris Doyle

10,703
2
23
42

votes

1 answer

How to remove horizontal and vertical lines without degrading the image quality in python

I am trying to remove horizontal and vertical lines from a image. This image is generated from a pdf using pdf2jpg library. Upon removal of the horizontal and vertical lines this image will be fed to pytesseract to extract words and their individual…

python python-3.x opencv python-tesseract

asked Nov 27 '20 at 15:20

Roy

votes

1 answer

Use Tesseract OCR to extract text from a scanned pdf folders

I have the code to extract/convert text from scanned pdf files/normal pdf files by using Tesseract OCR. But I want to make my code to convert a pdf folder rather than a single pdf file, then the extract text files will be store in a folder that I…

python pdf text tesseract python-tesseract

asked Sep 20 '20 at 20:51

CodingStark

votes

0 answers

tesseract return reversed words with arabic

hello everyone I'm trying to extract a license number plate from Tunisian cars so i decided to use tesseract to extract the numbers and word 'تونس' so before that i installed tesseract-OCR v5.0.0 for windows 10 and i wanted to try on an…

python ocr tesseract arabic python-tesseract

asked Sep 18 '20 at 20:55

Ameni Neffati

votes

0 answers

pytesseract image_to_pdf_or_hocr output pdf and also text

Is there a way to make pytesseract.image_to_pdf_or_hocr output both pdf and text data? Currently I am doing like this: pdf = pytesseract.image_to_pdf_or_hocr(fp.name, extension='pdf') text = pytesseract.image_to_string(fp.name) is there a way to do…

python tesseract python-tesseract

asked Jun 25 '20 at 01:48

Baconator507

1,747
2
12
20

votes

1 answer

Pytesseract - OCR on image with text in different colors

Pytesseract is unable to extract text when texts are present in different colors . I tried using opencv to invert the image but it doesn't work for dark text colors. The image: import cv2 import pytesseract from PIL import Image def…

python opencv python-imaging-library ocr python-tesseract

asked Apr 10 '20 at 05:12

Abhi

votes

0 answers

Pytesseract OCR for single character on images

I am trying to read text from the image , the image consist of single character it is not reading correctly. this is the type of images i have it is reading this image as 't' most of them it is reading incorrectly . these are some of my images…

python ocr python-tesseract

asked Mar 25 '20 at 14:46

hotshot code

votes

0 answers

Python speed up Pytesseract / Use native tesseract library in Python

I use the import cv2,pyautogui,numpy as np img=np.array(pyautogui.screenshot()) pytesseract.image_to_string(img, lang='eng') command to get the python wrapper for tesseract to get text from an image for me, which goes through the cli interface…

python tesseract python-tesseract

asked Mar 25 '20 at 06:49

azazelspeaks

5,727
2
22
39

votes

0 answers

OCR Tesseract - Get Image Font Attributes

I have been using Pytesseract to extract text from image. I am currently in a restoration task of an image document. Aside from extracting text from an image, I also wanted to identify each words font, font size, whether the character is capital or…

python image image-processing tesseract python-tesseract

asked Feb 12 '20 at 15:39

alyssaeliyah

2,214
6
33
80

votes

1 answer

Recognize specific numbers from table image with Pytesseract OCR

I want to read a column of number from an attached image (png file). My code is import cv2 import pytesseract import os img = cv2.imread(os.path.join(image_path, image_name), 0) config= "-c …

python opencv ocr image-recognition python-tesseract

asked Feb 02 '20 at 06:34

MollyBFL

votes

0 answers

PyTesseract causes PIL to raise ValueError: tile cannot extend outside image

I'm working on a program that splits a picture into a bunch of different parts, then converts each part to a string using pytesseract. The problem is that PIL, which is used in Pytesseract, keeps raising ValueError: tile cannot extend outside image.…

python python-imaging-library valueerror python-tesseract

asked Jan 26 '20 at 17:00

WRuet

votes

2 answers

raise TesseractError(proc.returncode, get_errors(error_string))

I am trying to extact text from an image using the pytesseract module in Python but I keep getting an error when I execute my code below. There is a similar question that someone provided with this answer…

python image ocr tesseract python-tesseract

asked Jan 02 '20 at 17:45

zlk2000

votes

1 answer

How to read digits from an image with PyTesseract OCR?

I'm trying to get PyTesseract OCR to read digits from this simple and well cropped Image, but for some reason it's just not able to do this. from PIL import Image import pytesseract as p def obtain_balance(a): im = Image.open(a) …

python image ocr tesseract python-tesseract

asked Dec 08 '19 at 17:27

THE YOGOVO

Prev 1 2 3

…

99 100 Next