Questions tagged [tesseract]

Tesseract is an OCR (Optical Character Recognition) engine originally developed at HP Labs and now available as an open source library with development sponsored by Google.

Tesseract is an open source, multi-lingual OCR (Optical Character Recognition) engine originally developed at HP Labs. It is now sponsored by Google and licensed under the Apache license 2.0. It currently recognizes 107 languages. Tesseract is primarily written in C++ and C. The project is hosted at https://github.com/tesseract-ocr/tesseract and its support forums are found at http://groups.google.com/group/tesseract-ocr.

4350 questions
1
vote
1 answer

Does CPython Enables Import of OpenCV and Tesseract-OCR in C# In Windows 10?

I'm trying to do simple OCR .Net program using C# in Windows 10 without using Visual Studio. What i did is using ironpython + python script. I've managed to pass bitmap data into a python script and output it. However, when i tried to process it…
1
vote
1 answer

How do I select a different language in pytesseract? I tried few methods but none worked

I have tried different ways to pass the language but none worked. Here is the code: dataframe_final=[] for i in range(len(boxes_list)): for j in range(len(boxes_list[i])): s='' if(len(boxes_list[i][j])==0): dataframe_final.append('…
1
vote
1 answer

how to fix a erro python tesseract error?

I need to use python tesseract to extract text from a photo: import pytesseract from PIL import Image img = Image.open('stest.png') pytesseract.pytesseract.tesseract_cmd = 'D:\\python\\venv\\Scripts\\pytesseract.exe' file_name =…
mark
  • 11
  • 2
1
vote
1 answer

opencv python - remove small points noise in binarized image

I am doing a document reader that parse all text inside it to a google spreadsheet, this script is supposed to save time in my work, the problem is that the binary image has a lot of noise (really small points around text) that confuses pytesseract.…
TTT2
  • 549
  • 2
  • 13
1
vote
1 answer

how to detect words in an image with OpenCV and Tesseract properly

I'm working on an application which reads an image file with OpenCV and processes the words on it with Tesseract. With the following code Tesseract detects extra rectangles which don't contain text. void…
Caner Kurt
  • 91
  • 1
  • 9
1
vote
1 answer

tesseract detects only 4 words from image

I have very simple python code: import cv2 import pytesseract pytesseract.pytesseract.tesseract_cmd = 'C:\\Tesseract-OCR\\tesseract.exe' img = cv2.imread('1.png') img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) hImg,wImg,_ = img.shape #detecting…
Erhan
  • 55
  • 5
1
vote
2 answers

C# - Class Library - TESSERACT - Failed to find library "leptonica-1.80.0.dll" for platform x86

I'm writing an activity component to include as nuget package in UiPath. Here structure: └─lib └─net ├─(dll developed) ├─Tesseract.dll ├─x64 │ ├─leptonica-1.80.0.dll │ └─tesseract41.dll ├─x86 │ ├─leptonica-1.80.0.dll …
1
vote
1 answer

tesseract Image.open for multi page files

This code accesses a folder of single page .tif files and extracts textual data. data = [] data1 = [] listOfPages = glob.glob(r"C:/Users/name/folder/*.tif") for entry in listOfPages: if os.path.isfile(entry): filenames = entry …
id345678
  • 97
  • 1
  • 3
  • 21
1
vote
1 answer

Bad format in tr file, reading fontname, unichar

I am very new to tesseract. I am following this tutorial on running a bash script to train data for Tesseract. I intend to created trained data for the BM Mini font. I have created a box file for my trained data image (I am only using one so far).…
1
vote
1 answer

pytesseract not recognizing symbols in front of letters

Trying to use pytesseract to read a few blocks of text but it isn't recognizing symbols when they are in front of or between words. It does however recognize the symbols when they are in front of numbers. Example: '#test $test %test' on the image…
1
vote
1 answer

Make cmd / powershell / windows use a separate tesseract installation instead of the one from chocolatey

Some years ago I installed tesseract with chocolatey and forgot about it. Now I have the problem that when I type a tesseract command in cmd it uses this tesseract instead of an new one I install separateley. The problem with the one that was…
user17027702
1
vote
1 answer

problem with this module Image::OCR::Tesseract

I have activestate perl v5.8.8 installed i install the following module Image::OCR::Tesseract with the ppm when I try to run the following code: use Image::OCR::Tesseract 'get_ocr'; my $image = 'my_image.jp'; my $text = get_ocr($image); I…
Scott
  • 11
  • 1
1
vote
0 answers

unable to resolve " line 93, in run() and line 89, in run anuncios.extend(anuncios_da_pagina) TypeError: 'NoneType' object not iterable"

from converter import gif_to_png, image_to_text from file_helper import dictionary_list_to_csv from util import get_site_html, get_bsobj_from def get_anuncio(url_anuncio): print("Buscando " + url_anuncio) anuncio = {"url": url_anuncio} html_anuncio…
1
vote
1 answer

Convert many .pdf files to .txt files using the new Tesseract OCR engine in R

My supervisor wants me to convert .pdf files to .txt files to be processed by a keyword extraction algorithm. The .pdf files are scanned court documents. She essentially wants a folder called court_document with subdirectories each named a…
Yang Wu
  • 175
  • 11
1
vote
1 answer

Library Tess-two not working in Api 29 or higher

In my App I am using the Tess-two (tess-two: 9.1. +) Library to do text recognition, and it works fine in Api 28 and earlier. But when I test in emulators with Api 29 or higher, the application closes, more precisely in the execution of the method…
1 2 3
99
100