Tesseract OCR error, probably because of traineddata

Asked Oct 17 '19 at 13:22

Active Oct 17 '19 at 13:43

Viewed 333 times

I am using pytesseract wrapper, with Legacy Tesseract (oem 0). This is my code line to extract text from image:

try:
    # extracting ocr data from image
    ocr_data = pytesseract.image_to_data(
        img, lang="eng", output_type=pytesseract.Output.DATAFRAME,
        config="--oem 0"
    )

except Exception as e:
    print("Trace:", e)

Error trace:

Trace: Tesseract Open Source OCR Engine v4.0.1 with Leptonica Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 389 tesseract: intmatcher.cpp:1160: void ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS, BIT_VECTOR): Assertion `ClassTemplate->ProtoLengths[ActualProtoNum] < MAX_PROTO_INDEX' failed. Aborted (core dumped)

I have also tried with command line tesseract and getting exactly same error. command used:

tesseract img.png out --oem 0 -l eng

I am using Tessdata files given on this link: https://github.com/tesseract-ocr/tessdata

I searched on google but couldn't find any help!

edited Oct 17 '19 at 13:43

asked Oct 17 '19 at 13:22

M Asad Ali

Could you show us, type of `img` variable? – 404pio Oct 17 '19 at 13:32
type of image is image is loaded using cv2.imread("img.png") – M Asad Ali Oct 17 '19 at 13:37

Tesseract OCR error, probably because of traineddata

0 Answers0