Surprisingly large difference between tess4j and tess-two

Asked Apr 07 '16 at 16:22

Active Apr 07 '16 at 16:22

Viewed 310 times

tess-two seems to work pretty well when I know EXACTLY the location on screen where the text I want to OCR is.

Now I'm trying to scan for text against a busy background, and it's not working quite as well. annotated android screenshot

I built a stand-alone driver using tess4j and I get significantly better results: annotated desktop screenshot

I'm using tess-two 5.4.1 and tessj4 3.0.

To scan I'm using TessBaseAPI.PageSegMode.PSM_SPARSE_TEXT_OSD and iterating using TessBaseAPI.PageIteratorLevel.RIL_TEXTLINE

Any ideas why the results are so different? Does tess4j do some kind of preprocessing that is leading to a better result?

Thanks in advance!

asked Apr 07 '16 at 16:22

steve

Interesting find. They use different versions of Tesseract, so that accounts for at least some of the difference. – rmtheis Apr 14 '16 at 14:33

0 Answers0