0

I am trying to implement Tesseract library to get text from the image, it works in some cases but in mostly it fails.

I am using this library in my Android project: https://github.com/rmtheis/tess-two

I am trying with this image

enter image description here

Actual Result

enter image description here

Expected Result

Wikipedia

The free Encyclopedia

Any suggestions as to why it's not working?

Community
  • 1
  • 1
Bot
  • 2,285
  • 3
  • 17
  • 20
  • There might be many reasons for that. Did you try googling `Tesseract image optimization` or similar? – FD_ Apr 02 '14 at 17:22
  • Yes, but did not find any reason that sense more, I see one app on Google Play https://play.google.com/store/apps/details?id=com.smartmobilesoftware.mobileocrfree which works in almost most cases, but I don't know what library they are using and algorithm behind it – Bot Apr 02 '14 at 17:26
  • I guess the linked app does the actual OCR on a server. The app's size is quite small, likely too small to contain an OCR engine with training data etc, and the app needs the Internet permission. – FD_ Apr 02 '14 at 17:29
  • Yes, it does but how it is doing, any code using PHP, Java , Python other ? – Bot Apr 02 '14 at 17:32
  • Looks like uneven illumination and stylized text. Did you take a picture of your screen, or use the image directly? – rmtheis Apr 04 '14 at 19:32
  • I captured the photo from my Android phone of Google Chrome Laptop Browser Wiki Page then start process – Bot Apr 05 '14 at 14:42

1 Answers1

0

It's not working because of:

  • The uneven illumination in the image
  • The presence of part of the globe graphic at the top of the captured image.

By taking a picture of the screen, you're introducing some darker areas on the image that's captured. To fix it, you could use the image directly instead of taking a picture, or you could add code to your app to adjust for the uneven illumination.

With different illumination, and cropping around the text area, I get a better result:

OCR result showing correct recognition

rmtheis
  • 5,992
  • 12
  • 61
  • 78