0

I have screen images consist of some digit values. I want to recognize those digits by Tesseract 4.0. However, these numbers consist of dashed lines, such as those of the seven-segment display. Tesseract can't recognize these values because of dashes. I used Gimp and joined those dashed lines into one piece. Tesseract almost recognize values correctly. I want to do that with OpenCV. How can I join dashed lines of digits into one piece?

before joining process after joining process

Ugurcan
  • 366
  • 1
  • 4
  • 17
  • Threshold your image to make it black/white. That should help tesseract. – fmw42 May 19 '20 at 21:58
  • Thank you. It helped but if I set fixed threshold it fails for other images. How can I make it adaptive? Actually I am using YOLO to extract screen regions from images. My plan is preprocess those extracted screen regions to feed Tesseract. Apparently, this preprocessing step will affect in bad way my data pipeline. – Ugurcan May 19 '20 at 22:19
  • try either Otsu thresholding or adaptive thresholding – fmw42 May 19 '20 at 22:55

1 Answers1

0

From my experience with Tesseract it should easily recognize these numbers without any preprocessing.
Is it by any chance possible that this image is simply zoomed in too much and thus the numbers are to big and hard to recognize for Tesseract? I would try to work with that first and if it's not going to help than you can look into Morphological Transformation in OpenCV

Karol Żak
  • 2,158
  • 20
  • 24
  • Thank you. You are right. I just changed image colors to binary image and then I converted to grayscale. Tesseract works well except a few conditions. It still can't detect .(dot) between numbers. Do you have any suggest for it? Also I have images under different lightning conditions and if I set fixed threshold value it fails for other images. – Ugurcan May 19 '20 at 22:16
  • You probably cant detect `dots` because this character is not whitelisted in your config: `config="-c tessedit_char_whitelist=0123456789."` – Karol Żak May 19 '20 at 22:51
  • @Ugurcan , Another thing is that you're using `--psm 8` which wont work for you. I would propose using 7 or 6 instead. All the modes [here](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc) – Karol Żak May 19 '20 at 23:08