Questions tagged [tesseract]

Tesseract is an OCR (Optical Character Recognition) engine originally developed at HP Labs and now available as an open source library with development sponsored by Google.

Tesseract is an open source, multi-lingual OCR (Optical Character Recognition) engine originally developed at HP Labs. It is now sponsored by Google and licensed under the Apache license 2.0. It currently recognizes 107 languages. Tesseract is primarily written in C++ and C. The project is hosted at https://github.com/tesseract-ocr/tesseract and its support forums are found at http://groups.google.com/group/tesseract-ocr.

4350 questions
1
vote
0 answers

PyTesseract - blacklisting chars in a specific position

I am working in Python using PyTesseract and OpenCV. I have a photo that is mixed numbers and letters. The photo is of a date and follows the format DDMMMYY e.g. 01JAN22 Tesseract is having trouble telling the difference between 0 and O and a few…
Bigred
  • 47
  • 5
1
vote
2 answers

Tesseract is not installed or it's not in your PATH

I'm currently on a ubuntu-environment and facing a problem when using tesseract and pycharm. So i get the error mentioned in the Title when i try to run following code in pycharm: import cv2 import opencv import pytesseract image =…
kirkegaard
  • 1,058
  • 2
  • 12
  • 32
1
vote
2 answers

How can I maximise the reliability of tesseract ocr for text recognition as much as possible?

I am attempting to collect data from a shop in a game ( starbase ) in order to feed the data to a website in order to be able to display them as a candle stick chart So far I have started using Tesseract OCR 5.0.0 and I have been running into issues…
1
vote
0 answers

tesseract not able to detect korean language properly

I am learning how to detect Korean text, for sample I am using Korean text present in back of package, but pytesseract.image_to_string(img_pl,lang='kor') is not able to segregate words separately when I query with level set to word Here is my…
1
vote
2 answers

Tesseract - Add reference does not works

I'm following the steps: Download binary here, add a reference of the assembly Tessnet2.dll to your .NET project. Download language data definition file here and put it in tessdata directory. Tessdata directory and your exe must be in the same…
The Mask
  • 17,007
  • 37
  • 111
  • 185
1
vote
0 answers

Tesseract OCR, extract dark box from image / finding y-coordinate to text at

For pre-processing, I need to crop out any pixels that are not in the dark box. The intent is to crop out just the text. Finally, I do additional processing to turn it into a black and white image perfect for Tesseract. Unfortunately, I've been…
1
vote
1 answer

can not extract persian/farsi text from image in python using pytesseract

I'm using pytesseract for extracting Persian text from the image but I get nothing! I downloaded fas.traineddata and put it in tessdata but still not working! here is my code import cv2 import pytesseract from unidecode import…
Aref.T
  • 41
  • 5
1
vote
0 answers

Setting Tesseract to better detect a specific image

I have an image (label of a microscopy slide), which I thought would be easy to OCR, because it is easily readable for humans. I am using the latest Tesseract V5 as a command line under Windows However, with tesseract image.jpg image.txt --oem 1…
1
vote
2 answers

How to install Tesseract 3.04 (old) in Alpine Linux, in Docker?

In my Dockerfile, I have RUN apk update && apk add tesseract-ocr=3.04 Which errors with: unable to select packages: tesseract-ocr-4.1.3-r0: breaks: world[tesseract-ocr=3.04] I've also tried add tesseract-ocr=3.04.01, which is how it's listed on…
ultraGentle
  • 5,084
  • 1
  • 19
  • 45
1
vote
0 answers

"Newbie" questions about PyTesseract OCR

I'm working on an application that would extract information from invoices that the user takes a picture of with his phone (using flask and pytesseract). Everything works on the extraction and classification side for my needs, using the…
Axel
  • 11
  • 1
1
vote
0 answers

NoClassDefFoundError: could not initialize class when running on another computer

I'm developing an application in Java using Maven. It works on my development computer (Windows) but when I try to run it on a different computer (mac), I get NoClassDefFoundError: Could not initialize class net.sourceforge.tess4j.TessAPI I'm new…
Nanthno
  • 11
  • 2
1
vote
1 answer

How to detect language or script from an input image using Python or Tesseract OCR?

Given an input image which can be in any language or writing system, how do I detect what script the text in the picture uses? Any Python-based or Tesseract-OCR based solution would be appreciated. Note that script here means writing systems like…
Gokul NC
  • 1,111
  • 4
  • 17
  • 39
1
vote
1 answer

Computer Vision - Use Image Matching or OCR to recognize page of a text only book?

I want to be able to recognize what page of a text only (no images) book I'm on... what is the best approach: I was initially thinking some sort of image matching but the pages of an all text book look so similar not sure how well this would…
1
vote
2 answers

pytesseract: convert pictures of 7-segment numbers to text

I'm trying to convert pictures like this: 7-segment into text with pytesseract: I tried different PSM modes and a whitelist with only 0123456789, but the best output of pytesseract was '5' instead of '125'. Is there a way to configure pytesseract…
1
vote
1 answer

C# - Tesseract OCR: scan multiple language at once

Any idea about how to do it? TesseractEngine engine = new TesseractEngine("./tessdata", "eng", EngineMode.Default); Usually, for one language, just adding the abbreviation is enough. But how if I want to scan an image with multiple languages in it?…
Riiko
  • 35
  • 7
1 2 3
99
100