Tesseract, Tess4J - improve OCR output on low DPI images

Asked Dec 21 '17 at 18:33

Active Dec 21 '17 at 18:33

Viewed 723 times

I use Tesseract and JNA wrapper Tess4J for my Java application.

I try to OCR jpeg images with 120 DPI. The output text is a pretty low quality and a lot of important words are not recognized properly. I think the main issue is in my input images because of 120 DPI. I tried to rescale the images x2. It helped but a very little bit. The result is still so far from perfect.

Right now I'm looking for other options with Tesseract in order to improve the OCR quality of my data.

My images contain health care information so I wondering - if I'll provide the custom dictionary with medical words, will it help to improve the quality of OCR? If so, if I'll provide a dictionary with 100k terms - how it will affect the performance of Tesseract?

Please show how to provide this dictionary with Tess4J.

What other options I should also try?

asked Dec 21 '17 at 18:33

alexanoid

24,051
54
210
410

Can you provide example image? – Dmitrii Z. Dec 22 '17 at 09:36

Tesseract, Tess4J - improve OCR output on low DPI images

0 Answers0