Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions
0
votes
1 answer

Blackout number in pdf using OCR

Have 3 pages PDF which has scanned Id card. Id card copy can be on any page I need to blackout Id card number (Format of Id card number - 12 Digits and two spaces i.e xxxx xxxx xxxx) Please suggest how can i achieve this I tried microsoft computer…
Tony
  • 52
  • 15
0
votes
2 answers

Write image text to a new text file?

I am using tesseract for OCR. I am on ubuntu 18.04. I have this program which extracts the texts from an image and print it. I want that program to create a new text file and paste the extracted content on to the new text file, but I am only able to…
Gaurav Bahadur
  • 189
  • 2
  • 14
0
votes
0 answers

Trouble pre-processing image to make text clearer in preparation for extraction

I have some images of some ceramic plates. The one shown below is an example of the worst possible from the batch. I am having trouble preprocessing it before using tesseract on it to the get the text (if it's possbile at all). If someone could give…
0
votes
1 answer

OCR with tesseract, pre-processing image

I need to extract digits from images like the one shown below, I'm using tesseract now, but it isn't working. Can anyone help me in pre-processing the images before feeding it to tesseract?
0
votes
1 answer

Pass a directory of pdf files for performing OCR and generate .txt files for each converted file in Python

I have a directory containing pdf files. I have written the code that performs OCR when you pass a filename to an object of the wand.image class. What I want to do presently is to loop over the directory of pdf files and generate a OCR'd txt file…
ajai biltu
  • 55
  • 6
0
votes
1 answer

identify clear text from image python

i used pytesseract to identify text from image pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' then i used below code to identify text textImg =…
0
votes
1 answer

Pytesseract behaving differently in Windows vs. Linux

I'm trying to make use of Pytesseract to do some very basic character recognition. When I run the following code in Linux, the output makes sense: import matplotlib.pyplot as plt import pandas as pd import sys import pytesseract # need to add…
ollerend
  • 500
  • 5
  • 17
0
votes
1 answer

Can pytesseract use ChoiceIterator to search over multiple matches?

Can pytesseract use ChoiceIterator to search over multiple matches? It seems to me that pytesseract is only an interface to the binary. tesserocr gives access to the Tesseract API which allows the use of ChoiceIterator. Example How do I use the…
qwr
  • 9,525
  • 5
  • 58
  • 102
0
votes
0 answers

Improving tesseract ocr result in french

I want to perform OCR on a image that is fairly clean and "easy" for OCR I think: But the result using tesseract is quite bad: print(pytesseract.image_to_string(Image.open('file-2.jpg'),lang='fra')) Maintenant ie La QT vieux, lorsque je parcours…
Sulli
  • 763
  • 1
  • 11
  • 33
0
votes
1 answer

RuntimeError: TSVNotSupported: TSV output not supported. Tesseract >= 3.05 required (Google Dataflow)

Currently want to distribute text detection on Google Dataflow on a huge dataset. I'm using the python package of tesseract which gets installed without a problem. The problem occurs when installing the tesseract-ocr package. It seems like it's…
0
votes
0 answers

Error importing PDF image to convert to text

I have a PDF image for transfer to image format so I am trying to read the PDF image and store the data in the text file. import pytesseract from PIL import Image img = Image.open('1.pdf') text = pytesseract.image_to_string(img) with open('1.txt',…
0
votes
0 answers

How do I fetch the source file from pytesseract extract

So the gist is after I extracted the OCR/tesseract data from a pool of images, I then run re.findall(r'example') How would I fetch the source file that has an "Mountain" word? It's still a bit vague in my part. Can you help out. Thanks! for index,…
0
votes
0 answers

Automate covering up text on image

I am just wondering if it is possible to use OCR such as pytesseract to automate covering text on image? I know that pytesseract is able to get the image_to_boxes(), which basically get the box for corresponding character. However, I do not want to…
Darren Christopher
  • 3,893
  • 4
  • 20
  • 37
0
votes
0 answers

How to get text from an image using pytesseract?

I have a scenario where I have to fetch some text from an image. But I am getting the following errors when trying to do so: runfile('/Users/vivekchowdary/Documents/untitled folder/pytesseract.py', wdir='/Users/vivekchowdary/Documents/untitled…
0
votes
1 answer

Python to read text from picture giving some import package errors

Unable to read text from a picture using PIL and pytesseract import PIL from PIL import Image import pytesseract im = PIL.Image.open('C:\\Users\\Edgar.Lizarraga\\Desktop\\Kaizen-Continuous-Improvement-Model.png') x =…