I'm working on OCR, and right now I'm working on parsing each individual character away from the others. E.g if I have an image that says the following:
12345678.90
I want to detect the x,y coordinates of where each number starts and where it ends in the image, so that I can determine how many numbers there are to process, and to then parse out each individual number / character, and process it.
I have devised a simple algorithm for doing it, and I want some opinions / reviews on how it could be improved.
(In this application, I have to only process numbers, but if this algorithm could also parse out letters, that'd be even better).
1) I would read the pixels in the image in a straight line, at the bottom of the image. E.g, if the image is 30x30, then I would start reading from 0,30 to 30,30.
2) I will compare the color of the pixel. Having already determined the background and foreground colors, I will compare each pixel's color to see if its in the background, or foreground.
3) If its the background, it will be ignored. If I encounter any pixel in the foreground, that would indicate the start of a digit. In that case, I would note the location, and then start to read the pixels upwards. E.g, if at 5,30 I detect a foreground color, I would start to read 5,29, 5,28, etc.
4) I would read the pixels upwards (y axis) until i encounter a pixel in the background color. This should give me the height of the character. (I know that for some chars like 5 it would be more complicated, lets ignore them for now). So I'd determine, e.g, that the character goes from 5,20 to 5,30 vertically.
5) Then I would go back to the x axis (5,30) where I detected the character's start horizontally. I would continue to read horizontally to determine the width of the character, e.g 6,30, 7,30, etc.
6) Here's the tricky step. I'm guessing, that between each characters of the following:
12345678.90
There is a pixel or so of gap in the background color. And that it may not be visible to us, but it is there and will be found by the program as goes pixel by pixel horizontally, reading the colors. That would tell it where the character ends horizontally. So e.g, it might detect the background color pixel at 15, 30.
- 7) That's the algorithm, it should give the
x,y
coordinates of where each letter starts and the next one begins. In the example above, the character would run from 5,20 to 15,30, and is 10x10.
Could this algorithm be improved, and/or am I correct in my assumption on step 6?