I am using pytesseract wrapper, with Legacy Tesseract (oem 0). This is my code line to extract text from image:
try:
# extracting ocr data from image
ocr_data = pytesseract.image_to_data(
img, lang="eng", output_type=pytesseract.Output.DATAFRAME,
config="--oem 0"
)
except Exception as e:
print("Trace:", e)
Error trace:
Trace: Tesseract Open Source OCR Engine v4.0.1 with Leptonica Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating resolution as 389 tesseract: intmatcher.cpp:1160: void ScratchEvidence::UpdateSumOfProtoEvidences(INT_CLASS, BIT_VECTOR): Assertion `ClassTemplate->ProtoLengths[ActualProtoNum] < MAX_PROTO_INDEX' failed. Aborted (core dumped)
I have also tried with command line tesseract and getting exactly same error. command used:
tesseract img.png out --oem 0 -l eng
I am using Tessdata files given on this link: https://github.com/tesseract-ocr/tessdata
I searched on google but couldn't find any help!