Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions
0
votes
1 answer

Does anyone know how Tesseract - OCR postprocessing / spellchecking works?

I was using tesseract-ocr (pytesseract) for spanish and it achieves very high accuracy when you set the language to spanish and of course, the text is in spanish. If you do not set language to spanish this does not perform that good. So, I'm…
Tomas -
  • 91
  • 8
0
votes
1 answer

Embedding Python in GRPC Server

I am exploring GRPC (C++). Following their examples I am trying to create a server which accepts an image from the client returns the text in the image. I have a python code which accepts an image and a json file describing the bounding box of the…
Raki
  • 329
  • 3
  • 18
0
votes
1 answer

Getting Pytesseract Error while creating .exe file using pyinstaller

So basically I am trying to create a simple flask app where we can use pytesseract to do OCR on image and return the data in string. And also i am packaging the whole app into the .exe file using the pyinstaller after doing the obfuscation of the…
Akash
  • 139
  • 1
  • 10
0
votes
2 answers

How to get coordinates of characters in html document?

refrence how to extract only 369 429 301 123 value from above code using python?
0
votes
0 answers

Read Clipboard image into function for OCR

I have written a function that reads the clipboard image whence captured as screenshot and then passes that captured image data to OCR engine. I am struggling with the passing of data. The code is given below. from tkinter import messagebox from PIL…
0
votes
2 answers

Name error: Image to text error in python

I am working on developing code to convert image to text using the below code. I see the below error while executing the code. I dont really understand what is causing the issue. Can any one help me to identify the issue. from PIL import…
KApril
  • 632
  • 1
  • 8
  • 20
0
votes
0 answers

Unable to extract text from those images

I tried to detect and extract text from the below images, but I am not able to get the header text properly. Image 1: Image 2: For those kinds of images, I am unable to detect and extract text from it. Please help me with those images. I tried the…
0
votes
1 answer

pyTesseract not outputing text from image

maybe someone could help me! When I run the following code import pytesseract from pytesseract import image_to_string from PIL import Image import PIL file = Image.open('/usr/local/Cellar/tesseract/4.1.0/share/tessdata/cap.png') we_will =…
0
votes
0 answers

Install ImageMagick and Ghostscript through Python

I am very new to Python. I have an OCR program that uses Tesseract, ImageMagick and Ghostscript. I created a .exe file to give it to my team so that they can use it on their tool. The problem that I am facing is that all of them will have to…
Vadiraj Katti
  • 101
  • 2
  • 11
0
votes
0 answers

Why doesn't my multi processor program take the whole path of my image?

I've been trying to use multiprocessing on a program that uses tesseract to extract text from images. But when I give the name to my image, it only searches for the first letter of the name of the image in the directory def tess(all_clips): …
PRATHAMESH
  • 123
  • 2
  • 3
  • 11
0
votes
0 answers

How to Improve Pytesseract Results

I'm trying to solve some semi simple CAPTCHA codes using Python3 on my Raspberry Pi 4. This is my current code. from PIL import Image from pytesseract import image_to_string img=Image.open('/home/pi/Desktop/Captcha Code…
Michael
  • 7
  • 3
0
votes
1 answer

Unable to properly read text from image which has a Color text in python

What I tried so far. it's working fine most of image which is text black and background is white. from PIL import Image import pytesseract import nltk import cv2 imageName = "p9.png" img = cv2.imread(imageName,cv2.IMREAD_COLOR) #Open the image from…
0
votes
0 answers

improper text alignment from pytesseract

Trying to extract data from pdf using pytesseract with below code. But text alignment is improper when printing/Writing data to doc. from PIL import Image import pytesseract import sys from pdf2image import convert_from_path import os PDF_file…
Ashu
  • 21
  • 1
  • 5
0
votes
1 answer

Solved: Python multiprocessing imap BrokenPipeError: [Errno 32] Broken pipe pdftoppm

Let me first say that this is not a duplicate of the other similar questions, where people tend to manage more closely the pool of workers. I have been struggling with the following exception thrown by my code when using multiprocessing.Pool.imap: …
0
votes
0 answers

Tesseract-OCR / Pillow not working with Pycharm

I have installed Tesseract-OCR using the provided installer and added it to the path. I have also installed Pillow using pip through CMD. But when I attempt to import them into pycharm it says that the modules do not exist. When I type 'tesseract'…