1

Hello I'm trying to use OCR tesseract to recognize some letters in a image.

I did a convert using imagemagick and image seems to be good but its not enough

The original images:

enter image description here

The command used with imagemagick to convert

convert input.jpg -fuzz 50% -fill black -opaque black -bordercolor white -border 2 -fill black -draw "color 0,0 floodfill" -alpha off -negate -units pixelsperinch -density 72 output.jpg

The result images:

enter image description here

The OCR tesseract command:

$ tesseract output.jpg out -psm 7

Output/result:

Text: AUGU -> AUOU

Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica Page 1

Text: VEGU -> VOR-OU

Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica Page 1

Text: EGUV -> E6UV

Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica Page 1

Text: USEA -> USSOEA

iehrlich
  • 3,572
  • 4
  • 34
  • 43
J. Metal
  • 11
  • 2
  • 2
    Your problem is likely due to rotated letters and numbers. My understanding is that OCR generally does not like rotated characters. It expects characters to be properly oriented for best recognition. But I am not an OCR expert. So I will defer to others that may know more. – fmw42 Jul 05 '17 at 04:09
  • CONTINUED: Try an example that has letters that are not rotated. Does that work? – fmw42 Jul 05 '17 at 04:48
  • I got working with other version of tesseract, thank you! – J. Metal Jul 07 '17 at 06:50

1 Answers1

0

Not sure if it was pure luck, as you have only provided a single image to test with, but I noticed you are using a noisy/fuzzy JPEG instead of a nice clean PNG, so I thresholded your image at 50% and made a PNG of it and it recognises all four letters correctly:

convert yourImage.jpeg -threshold 50% clean.png
tesseract -psm 7 clean.png out
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • Tried, without sucess, what version of tesseract did you use? The result was `u s o: A` – J. Metal Jul 06 '17 at 13:52
  • `$ convert image_test.jpg -threshold 50% clean.png` `$ tesseract clean.png out -psm 7 && cat out.txt` Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica Warning. Invalid resolution 0 dpi. Using 70 instead. u s o: A` – J. Metal Jul 06 '17 at 13:53
  • Mine is v3.05.01 with leptonica 1.74.1 – Mark Setchell Jul 06 '17 at 15:44
  • Thank you, with this version worked fine, I recognize 13 of 15 images! – J. Metal Jul 07 '17 at 06:49