Improving accuracy in Tess4j - best options

Question

I'm using Tess4j for text recognition from an image, but I'm having big problems with recognition accuracy.

I've already done some tests processing the image with the openCV tools, but although it helped, the problem is still not solved.

And I also tried transforming the image from png to svg to improve comprehensibility, using ImageTracer from jankovicsandras, but it didn't help with this problem.

Googling I found only one possibility: to train the data using the font present in the images I want to convert, but since it would take some time I'd like to discuss it with you first, if you have other ideas.

PS: it would take time because I don't have Linux and from Windows I would have to do the WSL.

Considering that the wiki says:

The existing model data provided has been trained on about 400000 textlines spanning about 4500 fonts

so it's possible that it can't recognise this font (it's quite basic):

Do you think it makes sense to train ocr for a specific font?

Thanks for your help!

I tried (copy/paste) the image in VietOCR, which employs Tess4J, and got the correct results without any special image processing. The standard-issue `traineddata` pack seems to handle it well. — nguyenq, May 08 '22 at 01:43

Improving accuracy in Tess4j - best options

0 Answers0