3

Good night :)

I am currently playing with the DevIL library that allows me to load in image and check RGB values per pixel. Just as a personal learning project, I'm trying to write a very basic OCR system for a couple of images I made myself in Photoshop.

I am successfully able to remove all the distortions in the image and I'm left with text and numbers. I am currently not looking for an advanced neural network that learns from input. I want to start out relatively easy and so I've set out to identify the individual characters and count the pixels in those characters.

I have two problems:

  • Identifying the individual characters.
  • Most importantly: I need an algorithm to count connected pixels (of the same color) without counting pixels I've previously counted. I have no mathemathical background so this is the biggest issue for me.

Any help in the matter is appreciated, thanks.

edit:

I have tagged this question as C++ because that is what I am currently using. However, pseudo-code or easily readable code from another language is also fine.

Daniel Sloof
  • 12,568
  • 14
  • 72
  • 106

3 Answers3

2

The flood fill algorithm will work for counting the included pixels, as long as you have the images filtered down to simple black & white bitmaps.

Having said that, you can perform character recognition by comparing each character to a set of standard images of each character in your set, measuring the similarity, and then choosing the character with the highest score.

Take a look at this question for more information.

Community
  • 1
  • 1
e.James
  • 116,942
  • 41
  • 177
  • 214
1

Not sure this helps, but there is a GPL OCR lib called gocr.

Gian Paolo
  • 514
  • 2
  • 5
1

Apologies if this is too far off-topic, but IMHO Vigra (not the other one!) is a much better image processing library for C++ than DevIL.

Gian Paolo
  • 514
  • 2
  • 5