3

I'm working on OCR, and right now I'm working on parsing each individual character away from the others. E.g if I have an image that says the following:

12345678.90

I want to detect the x,y coordinates of where each number starts and where it ends in the image, so that I can determine how many numbers there are to process, and to then parse out each individual number / character, and process it.

I have devised a simple algorithm for doing it, and I want some opinions / reviews on how it could be improved.

(In this application, I have to only process numbers, but if this algorithm could also parse out letters, that'd be even better).

  • 1) I would read the pixels in the image in a straight line, at the bottom of the image. E.g, if the image is 30x30, then I would start reading from 0,30 to 30,30.

  • 2) I will compare the color of the pixel. Having already determined the background and foreground colors, I will compare each pixel's color to see if its in the background, or foreground.

  • 3) If its the background, it will be ignored. If I encounter any pixel in the foreground, that would indicate the start of a digit. In that case, I would note the location, and then start to read the pixels upwards. E.g, if at 5,30 I detect a foreground color, I would start to read 5,29, 5,28, etc.

  • 4) I would read the pixels upwards (y axis) until i encounter a pixel in the background color. This should give me the height of the character. (I know that for some chars like 5 it would be more complicated, lets ignore them for now). So I'd determine, e.g, that the character goes from 5,20 to 5,30 vertically.

  • 5) Then I would go back to the x axis (5,30) where I detected the character's start horizontally. I would continue to read horizontally to determine the width of the character, e.g 6,30, 7,30, etc.

  • 6) Here's the tricky step. I'm guessing, that between each characters of the following:

    12345678.90

There is a pixel or so of gap in the background color. And that it may not be visible to us, but it is there and will be found by the program as goes pixel by pixel horizontally, reading the colors. That would tell it where the character ends horizontally. So e.g, it might detect the background color pixel at 15, 30.

  • 7) That's the algorithm, it should give the x,y coordinates of where each letter starts and the next one begins. In the example above, the character would run from 5,20 to 15,30, and is 10x10.

Could this algorithm be improved, and/or am I correct in my assumption on step 6?

Ali
  • 261,656
  • 265
  • 575
  • 769

3 Answers3

2

A common approach which I know for segmentation of digits is the sliding window. The basic idea is that you slide a window of some size over the image of digits.

Each movement of the sliding window produces an image (you look only at pixels covered by the window). The sliding window will be narrow. Now classifier can be trained, that will map sliding window to 1 or 0, where 1 indicates that sliding window is centered on a split of 2 digits, and 0 indicates the opposite.

You would need some training data to train the classifier. Or you can try to use unsupervised learning.

EDIT : This video can be useful : https://www.youtube.com/watch?v=y6ga5DeVgSY

kudkudak
  • 496
  • 3
  • 7
  • do you mean literally sliding a GUI window over images? – Ali Sep 15 '13 at 22:09
  • It is not easy to explain in text. Please see the video it should explain everything much better. What I mean is that you slide a "virtual" window over the image, that looks at different regions of the image, and you can make a decision (using some algorithm, like supervised learning) whether this particular region of the image contains "space" or split between digits. – kudkudak Sep 15 '13 at 22:11
  • Can you link another video to describe this method please? The linked video is no longer on Youtube. – steve Jun 07 '17 at 17:50
1

DISCLAIMER: I never wrote any OCR-like software before.

To me, your algorithm seems a bit off, because of the following reasons:

  • 1 starts not where you find the first pixel at the bottom, because you still have the little stroke that points to the left, on top of the 1.
  • 2 would be only a few pixels high, since you are going straight up until you find a background pixel.
  • 3 would result in being only 1 pixel by 1 pixel, due to the same arguments as above.
  • etc...

I would try to use a recursive algorithm that follows the foreground color pixels as far as it can without going into the background pixels. When using big images with big characters, this might cause a stack overflow, so it would be nice to do the trick in a couple of for loops instead of using a recursive function.

If you are doing it this pixel by pixel discovery of one character, you can use that process to create vector information on what your character looks like. I think that would be a cool starting point to recognize the characters.

Martijn Courteaux
  • 67,591
  • 47
  • 198
  • 287
  • You are correct about 1 .. although it would depend on the fonts. Some fonts would show a line underneath 1.. but still, good point. For 2 and 3, if I changed the algorithm to read on the x axis until it encountered background (e.g from 5,30 to 15,30), and to then start reading upwards till background (e.g 15,30 to 15,20), that might solve the problem for 5, 2, and 3. What do you think? – Ali Sep 15 '13 at 21:28
  • What I think is that the whole idea of "reading until..." is wrong. Use a recursive method. Find all foreground pixels that are attached to each other. – Martijn Courteaux Sep 15 '13 at 21:49
  • But in that case, how would I tell where one number separates from the other? Wouldn't 12345 all appear joined together rather than separate? – Ali Sep 15 '13 at 21:58
  • Since you said you assume there is at least a 1 pixel gap, it shouldn't. However, in my browser 34 is rendered connected. If that is an issue, you could try to implement a threshold in combination with a n expected width for the character, by using an average aspect ratio of a character (I guess it is somewhere at 2:3). – Martijn Courteaux Sep 15 '13 at 22:02
1

I've not tried to write OCR software, but we do use it, and it is (or can) get very complicated.

It's not totally clear where your image is coming from; if it's a scanned image, then there are several complications. Not least in regard to your plan is that even if there is a gap between digits it may not be vertical (it's very unlikely that the page scanned will be totally straight). Other factors include "speckle" -- random dots caused by dirt etc. on the image or the scanner. If you're processing this kind of image, you almost certainly need to look towards Image Processing techniques that apply many different mathematical operations to the whole array of pixels to do things like deskew (straighten the image), despeckle (get rid of random dots); edge-enhancement (strengthen changes from light to dark to enhance lines).

From your use of "background" and "foreground" colours, it may be that you're trying to "OCR" an image from the screen? If so (some kind of "screen-scraping" process), and you know (or can be trained with) the specific character-shapes being interpreted, then a variant of the sliding window may help: you slide the known image of a '5' around the image at different offsets: if all the pixels of the '5' match "foreground" pixels in the image, then you know you've found a '5'. Repeat for other digits. As above, this is a "virtual" window we're talking about.

TripeHound
  • 2,721
  • 23
  • 37