Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions
4
votes
2 answers

Tesseract OCR having trouble detecting numbers

I am trying to detect some numbers with tesseract in python. Below you will find my starting image and what I can get it down to. Here is the code I used to get it there. import pytesseract import cv2 import numpy as…
4
votes
2 answers

How to remove noise around numbers using OpenCV

I'm trying to use Tesseract-OCR to get the readings on below images but having issues getting consistent results with the spotted background. I have below configuration on my pytesseract CONFIG = f"—psm 6 -c…
MisterButter
  • 749
  • 1
  • 10
  • 27
4
votes
0 answers

OCRmyPDF cannot find Leptonica Library

I installed the OCRmyPDF package in a conda environment that I have been using with pytesseract. When I ran the command "ocrmypdf --help" I received the following error: [WinError 2] The system cannot find the file specified Traceback (most recent…
terry
  • 141
  • 7
4
votes
1 answer

Tesseract 4.1.1 error eng.traineddata not found in google colab

I am trying to install tesseract 4.1.1 in google colab. I have installed tesseract and I can check the version using !tesseract --version. After that I have download eng.traineddata and org.traineddata in the /usr/local/share/tessdata/ folder Now…
Joy Mazumder
  • 870
  • 1
  • 8
  • 14
4
votes
0 answers

Extracting invoice data using OpenCV

We are trying to extract Invoice Data (Pdf/Image) using Deep learning libraries i.e OpenCv or any other one. We are getting multiple Invoices in the form of PDF or Images on the daily basis, from which we have to capture certain fields like Bill No,…
NKJ
  • 457
  • 1
  • 4
  • 11
4
votes
1 answer

How to find center coordinates of numbers in an image

I'm currently working on my first assignment in image processing (using OpenCV in Python). My assignment is to calculate a precise score (to tenths of a point) of one to several shooting holes in an image uploaded by a user. One of the requirements…
JakubS
  • 133
  • 6
4
votes
1 answer

(2, 'Usage: pytesseract [-l lang] input_file') on Google Colab

I am trying to run Tesseract into Google Colab: !sudo apt install tesseract-ocr !pip install pytesseract import pytesseract import shutil import os import random try: from PIL import Image except ImportError: import Image from google.colab…
ardito.bryan
  • 429
  • 9
  • 22
4
votes
3 answers

OSError: [WinError 740] The requested operation requires elevation

I am having a simple code which has an image called "try.png" and I want to convert it from Image to Text using pytesseract but I am having some issues with the code. import cv2 import…
4
votes
1 answer

How to enhance Tesseract automatic text rotation capabilities for OCR?

I have a set of PIL images, where some pages are correctly rotated, while others have a rotation close to 180°. This means that automatic orientation detection may fail as instead of 178° degrees recognizes a 2° degrees orientation. Unfortunately,…
linello
  • 8,451
  • 18
  • 63
  • 109
4
votes
0 answers

How to use Pytesseract for a Kivy app on Android?

So I am planning to use Tesseract for a Kivy app. But I am not sure if it will work on Android, because in Tesseract you have to give a path to the executable. I have read a post here about this, and it was said that we need a recipe or wrapper for…
Lalli Garden
  • 259
  • 1
  • 13
4
votes
2 answers

cv2 to tesseract directly without saving

import pytesseract from pdf2image import convert_from_path, convert_from_bytes import cv2,numpy def pil_to_cv2(image): open_cv_image = numpy.array(image) return open_cv_image[:, :, ::-1].copy() path='OriginalsFile.pdf' images =…
user11322408
4
votes
1 answer

How to improve OCR accuracy?

I have 2 images like shown below. A.png is perfectly read by tesseract but B.png is terribly bad accuracy even though the B.png is similar to A.png. How can I improve the accuracy? I have no idea where to start debugging? A.png B.png Run…
zono
  • 8,366
  • 21
  • 75
  • 113
4
votes
0 answers

Text recognition on rotated and intersecting characters from images

I write code for recognizing words and letters from images using Tesseract-OCR and OpenCV, but it is only suitable for flat letters and words. The question is how to improve this code so that it can recognize rotated and intersecting characters and…
4
votes
1 answer

Recognize single characters on a page with Tesseract

this image returns empty string; basically I am trying to make a bot for WOW game, but I am really new to this OCR thing. I cannot make tesseract to read this image; I want an unordered list of characters and if possible coordinates of each square…
4
votes
2 answers

Drawing bounding boxes with Pytesseract / OpenCV

I'm using pytesseract (0.3.2) with openCV (4.1.2) to identify digits in images. While image_to_string is working, image_to_data and image_to_boxes are not. I need to be able to draw the bounding boxes on the images and this has stumped me. I've…