Extract Data from an Image with Python/OpenCV/Tesseract?

Question

I'm trying to extract some contents from a cropped image. I tried pytesseract and opencv template matching but the results are very poor. OpenCV template matching sometimes fails due to poor quality of the icons and tesseract gives me a line of text with false characters.

I'm trying to grab the values like this:

0:26 83 1 1

Any thoughts or techniques?

score 1 · Answer 1 · answered Apr 10 '20 at 02:53

A technique you could use would be to blur your image. From what it looks like, the image is kind of low res and blurry already, so you wouldn't need to blur the image super hard. Whenever I need to use a blur function in Opencv, I normally choose the gaussian blur, as its technique of blurring each pixel as well as each surrounding pixel is great. Once the image is blurred, I would threshold, or adaptive threshold the image. Once you have gotten this far, the image that should be shown should be mostly hard lines with little bits of short lines mixed between. Afterwards, dilate the threshold image just enough to have the bits where there are a lot of hard edges connect. Once a dilate has been performed, find the contours of that image, and sort based on their height with the image. Since I assume the position of those numbers wont change, you will only have to sort your contours based on the height of the image. Afterwards, once you have sorted your contours, just create bounding boxes over them, and read the text from there.

However, if you want to do this the quick and dirty way, you can always just manually create your own ROI's around each area you want to read and do it that way.

First Method

Gaussian blur the image
Threshold the image
Dilate the image
Find Contours
Sort Contours based on height
Create bounding boxes around relevent contours

Second Method

Manually create ROI's around the area you want to read text from

Extract Data from an Image with Python/OpenCV/Tesseract?

1 Answers1