Questions tagged [tesseract]

Tesseract is an OCR (Optical Character Recognition) engine originally developed at HP Labs and now available as an open source library with development sponsored by Google.

Tesseract is an open source, multi-lingual OCR (Optical Character Recognition) engine originally developed at HP Labs. It is now sponsored by Google and licensed under the Apache license 2.0. It currently recognizes 107 languages. Tesseract is primarily written in C++ and C. The project is hosted at https://github.com/tesseract-ocr/tesseract and its support forums are found at http://groups.google.com/group/tesseract-ocr.

4350 questions

vote

1 answer

Does CPython Enables Import of OpenCV and Tesseract-OCR in C# In Windows 10?

I'm trying to do simple OCR .Net program using C# in Windows 10 without using Visual Studio. What i did is using ironpython + python script. I've managed to pass bitmap data into a python script and output it. However, when i tried to process it…

asked Nov 26 '21 at 12:57

ymir.tuskar01

vote

1 answer

How do I select a different language in pytesseract? I tried few methods but none worked

I have tried different ways to pass the language but none worked. Here is the code: dataframe_final=[] for i in range(len(boxes_list)): for j in range(len(boxes_list[i])): s='' if(len(boxes_list[i][j])==0): dataframe_final.append('…

ocr tesseract python-tesseract

asked Nov 22 '21 at 07:34

Faizan Ahmad

vote

1 answer

how to fix a erro python tesseract error?

I need to use python tesseract to extract text from a photo: import pytesseract from PIL import Image img = Image.open('stest.png') pytesseract.pytesseract.tesseract_cmd = 'D:\\python\\venv\\Scripts\\pytesseract.exe' file_name =…

python tesseract python-tesseract

asked Nov 20 '21 at 12:21

mark

vote

1 answer

opencv python - remove small points noise in binarized image

I am doing a document reader that parse all text inside it to a google spreadsheet, this script is supposed to save time in my work, the problem is that the binary image has a lot of noise (really small points around text) that confuses pytesseract.…

python opencv image-processing tesseract python-tesseract

asked Nov 18 '21 at 20:44

TTT2

vote

1 answer

how to detect words in an image with OpenCV and Tesseract properly

I'm working on an application which reads an image file with OpenCV and processes the words on it with Tesseract. With the following code Tesseract detects extra rectangles which don't contain text. void…

c++ opencv tesseract

asked Nov 18 '21 at 12:51

Caner Kurt

vote

1 answer

tesseract detects only 4 words from image

I have very simple python code: import cv2 import pytesseract pytesseract.pytesseract.tesseract_cmd = 'C:\\Tesseract-OCR\\tesseract.exe' img = cv2.imread('1.png') img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) hImg,wImg,_ = img.shape #detecting…

python tesseract python-tesseract opencv

asked Nov 13 '21 at 11:03

Erhan

vote

2 answers

C# - Class Library - TESSERACT - Failed to find library "leptonica-1.80.0.dll" for platform x86

I'm writing an activity component to include as nuget package in UiPath. Here structure: └─lib └─net ├─(dll developed) ├─Tesseract.dll ├─x64 │ ├─leptonica-1.80.0.dll │ └─tesseract41.dll ├─x86 │ ├─leptonica-1.80.0.dll …

c# tesseract uipath

asked Nov 09 '21 at 11:53

Maurizio Bruccoleri

vote

1 answer

tesseract Image.open for multi page files

This code accesses a folder of single page .tif files and extracts textual data. data = [] data1 = [] listOfPages = glob.glob(r"C:/Users/name/folder/*.tif") for entry in listOfPages: if os.path.isfile(entry): filenames = entry …

python tesseract python-tesseract

asked Oct 29 '21 at 07:45

id345678

vote

1 answer

Bad format in tr file, reading fontname, unichar

I am very new to tesseract. I am following this tutorial on running a bash script to train data for Tesseract. I intend to created trained data for the BM Mini font. I have created a box file for my trained data image (I am only using one so far).…

bash tesseract

asked Oct 27 '21 at 23:24

user17223731

vote

1 answer

pytesseract not recognizing symbols in front of letters

Trying to use pytesseract to read a few blocks of text but it isn't recognizing symbols when they are in front of or between words. It does however recognize the symbols when they are in front of numbers. Example: '#test $test %test' on the image…

python tesseract python-tesseract

asked Oct 19 '21 at 16:31

mattwatkins

vote

1 answer

Make cmd / powershell / windows use a separate tesseract installation instead of the one from chocolatey

Some years ago I installed tesseract with chocolatey and forgot about it. Now I have the problem that when I type a tesseract command in cmd it uses this tesseract instead of an new one I install separateley. The problem with the one that was…

windows powershell cmd tesseract chocolatey

asked Oct 17 '21 at 22:08

user17027702

vote

1 answer

problem with this module Image::OCR::Tesseract

I have activestate perl v5.8.8 installed i install the following module Image::OCR::Tesseract with the ppm when I try to run the following code: use Image::OCR::Tesseract 'get_ocr'; my $image = 'my_image.jp'; my $text = get_ocr($image); I…

perl cpan tesseract

asked Aug 05 '11 at 11:23

Scott

vote

0 answers

unable to resolve " line 93, in run() and line 89, in run anuncios.extend(anuncios_da_pagina) TypeError: 'NoneType' object not iterable"

from converter import gif_to_png, image_to_text from file_helper import dictionary_list_to_csv from util import get_site_html, get_bsobj_from def get_anuncio(url_anuncio): print("Buscando " + url_anuncio) anuncio = {"url": url_anuncio} html_anuncio…

python tesseract

asked Sep 26 '21 at 14:47

asim-biosint

vote

1 answer

Convert many .pdf files to .txt files using the new Tesseract OCR engine in R

My supervisor wants me to convert .pdf files to .txt files to be processed by a keyword extraction algorithm. The .pdf files are scanned court documents. She essentially wants a folder called court_document with subdirectories each named a…

r ocr tesseract file-conversion

asked Sep 24 '21 at 08:18

Yang Wu

vote

1 answer

Library Tess-two not working in Api 29 or higher

In my App I am using the Tess-two (tess-two: 9.1. +) Library to do text recognition, and it works fine in Api 28 and earlier. But when I test in emulators with Api 29 or higher, the application closes, more precisely in the execution of the method…

android tesseract

asked Sep 20 '21 at 13:12

Andrea Grassano

Prev 1 2 3

…

100