3

So I have the following image:

enter image description here

I'm trying to extract three arrays:

var a = [30,31,32,35,37,40,44];
var b = [6,7,11,15,18,21,22];
var c = [5,11,15,18,23,37,28];

I tried feeding this image into tesseract ~/Desktop/test.png out to no avail:

9 % ooenesew @
5 ‘ 904399

And here is the result from ocrad ~/Desktop/test.ppm:

o
?
28

Can any OCR experts suggest what I might try next? I'm comfortable using Python/OpenCV, but will try anything.

Eamorr
  • 9,872
  • 34
  • 125
  • 209
  • Not an OCR expert, but do you know that you're always going to scan that image non-skewed and at the same scale? Do you know what each of the possible numbers (I'm assuming 1-50 or so) look like? If so, you can reduce this problem from OCR to feature matching. – Foon Apr 05 '15 at 13:07
  • Hi, it will always be non-skewed and at the same scale. The number range is actually [1-45]. Yes, I might be able to do feature matching. – Eamorr Apr 05 '15 at 13:11
  • 1
    This seems to be a rather unusual font, and also you're only trying to match a very limited set of characters. So I think you need to first [Train Tesseract](https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3). – Lukas Graf Apr 05 '15 at 13:16

1 Answers1

1

If your images always look like in the example, you might have to do some tidy up to remove anything that is not a number (all the black background and the circle). Then the method described in the accepted answer on the linked question might be sufficient for your needs, since it looks like you are not dealing with different fonts and sizes: Simple Digit Recognition OCR in OpenCV-Python

Community
  • 1
  • 1
ikkjo
  • 735
  • 1
  • 9
  • 18
  • I'm not sure how I'd remove the black reliably. Is there a way to do a fuzzy select in OpenCV? Start at 0,0, fuzzy select all black pixels and if the total area is greater than some threshold, delete the fuzzy area. – Eamorr Apr 05 '15 at 13:25
  • 1
    You can use findContours to get the connected components and then separate candidate characters by filtering those that do not match a certain criteria. In your case, you might get away with taking the size and/or he aspect ration of the bounding box (boundingRect) of a contour to decide if you want to filter it out. The linked code already takes a similar approach and implements a basic form of filtering based on area (if cv2.contourArea(cnt)>50:) and height (if h>28:). – ikkjo Apr 05 '15 at 16:15