Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions
11
votes
3 answers

How to separate title and headers from body text in image

I am using tesseract (through the python wrapper) in order to extract text from documents. These documents do not include any images or tables, simply text. Is there any option to distinguish the titles/headings from the text? Ideally I want to be…
Prikers
  • 858
  • 1
  • 9
  • 24
11
votes
1 answer

Can I test tesseract ocr in windows command line?

I am new to tesseract OCR. I tried to convert an image to tif and run it to see what the output from tesseract using cmd in windows, but I couldn't. Can you help me? What will be command to use? Here is my sample image:
Akunar
  • 145
  • 1
  • 1
  • 9
10
votes
4 answers

Simple Captcha Solving

I'm trying to solve some simple captcha using OpenCV and pytesseract. Some of captcha samples are: I tried to the remove the noisy dots with some filters: import cv2 import numpy as np import pytesseract img = cv2.imread(image_path) _, img =…
Mehran Torki
  • 977
  • 1
  • 9
  • 37
10
votes
3 answers

How to detect subscript numbers in an image using OCR?

I am using tesseract for OCR, via the pytesseract bindings. Unfortunately, I encounter difficulties when trying to extract text including subscript-style numbers - the subscript number is interpreted as a letter instead. For example, in the basic…
dspencer
  • 4,297
  • 4
  • 22
  • 43
10
votes
3 answers

What is the difference between Pytesseract and Tesserocr?

I'm using Python 3.6 in Windows 10 and have Pytesseract already installed but I found in a code Tesserocr which by the way I can't install. What is the difference?
Soufiane S
  • 197
  • 1
  • 4
  • 16
10
votes
3 answers

Real time OCR in python

The problem Im trying to capture my desktop with OpenCV and have Tesseract OCR find text and set it as a variable, for example, if I was going to play a game and have the capturing frame over a resource amount, I want it to print that and use it. A…
Novet
  • 105
  • 1
  • 1
  • 5
10
votes
2 answers

How to get the co-ordinates of the text recogonized from Image using OCR in python

I am trying to get the coordinates or positions of text character from an Image using Tesseract. I want to know the exact pixel position, so that i can click that text using some other tool. Edit : import pytesseract from pytesseract import…
Maddy
  • 133
  • 1
  • 1
  • 7
9
votes
1 answer

how to increase resolution of text in scanned images in python?

I use tesseract-OCR to extract text from scanned images, For few images text is not properly recognized due to low resolution and output produced is some irrelevant characters. Techniques applied: Increase the dpi to 300. Image pre- processing…
Jennifer
  • 119
  • 1
  • 8
9
votes
4 answers

How to install tesseract for python on anaconda

Does anyone know how to install tesseract for python on Anaconda? I have a windows system. The anaconda website gives the installation for a linux system: conda install -c auto pytesseract Would there be any alterations required for a windows…
VK1
  • 180
  • 1
  • 1
  • 9
9
votes
5 answers

Highly inconsistent OCR result for tesseract

This is the original screenshot and I cropped the image into 4 parts and cleared the background of the image to the extent that I can possibly do but tesseract only detects the last column here and ignores the rest. The output from the tesseract…
codefreaK
  • 3,584
  • 5
  • 34
  • 65
9
votes
1 answer

How to set init only parameters with python tesseract?

I'm trying to set some Tesseract parameters using the python-tesseract wrapper, but for Init Only parameters I'm unable to do so. I've been reading the Tesseract documentation and it seems i must use Init() to set these. These is what the…
tiagosilva
  • 1,695
  • 17
  • 31
9
votes
1 answer

How to set tessedit_write_images in python-tesseract?

I'm trying to set tessedit_write_images but can't seem to do it, i can't see the tessinput.tif anywhere i'm doing: import tesseract api =…
tiagosilva
  • 1,695
  • 17
  • 31
8
votes
3 answers

Cannot import name '_imaging' from 'PIL'

I'm trying to run this code: import pyautogui import time from PIL import _imaging from PIL import Image import pytesseract time.sleep(5) captura = pyautogui.screenshot() codigo = captura.crop((872, 292, 983,…
Andresnex
  • 87
  • 1
  • 1
  • 5
8
votes
1 answer

Python - OCR - pytesseract for PDF

I am trying to run the following code: import cv2 import pytesseract img = cv2.imread('/Users/user1/Desktop/folder1/pdf1.pdf') text = pytesseract.image_to_string(img) print(text) which gives me the following error: Traceback (most recent call…
adrCoder
  • 3,145
  • 4
  • 31
  • 56
8
votes
3 answers

How to get confidence of each line using pytesseract

I have successfully setup Tesseract and can translate the images to text... text = pytesseract.image_to_string(Image.open(image)) However, I need to get the confidence value for every line. I cannot find a way to do this using pytesseract. Anyone…
buydadip
  • 8,890
  • 22
  • 79
  • 154