28

Note that I'm really looking for an answer to my question. I am not looking for a link to some source code or to some academic paper: I've already used the source and I've already read papers and still haven't figured out the last part of this issue...

I'm working on some fast screen font OCRing and I'm making very good progress.

I'm already finding the baselines, separating the characters, transforming each character in black & white and then contouring each character in order to apply a Freeman chain code to it.

Basically it's an 8-connected chain code looking like this:

  3  2  1
   \ | /
  4-- --0
   / | \
  5  6  7

So if I have an 'a', after all my transformations (including transforming to black and white), I end up with something like this:

11110
00001
01111
10001
10001
01110

Then it's external countour may look like this (I may be making a mistake here, that's ASCII-art contouring and my 'algorithm' may get the contour wrong but that's not the point of my question):

 XXXX
X1111X
 XXXX1X
X01111X
X10001X
X10001X
 X111X
  XXX

Following the Xs, I get the chain code, which would be:

0011222334445656677

Note that that's the normalized chain code but you can always normalized a chain code like this: you just keep the smallest integer.

(By the way, there's a super-efficient implementation to find the chain code where you simply take the 8 adjacent pixels of an 'X' and then look in a 256 lookup table if you have 0,1,2,3,4,5,6 or 7)

My question now, however, is: from that 0011222334445656677 chain code, how do I find that I have an 'a'?

Because, for example, if my 'a' looks like this:

11110
00001
01111
10001
10001
01111  <-- This pixel is now full

Then my chain code is now: 0002222334445656677

And yet this is also an 'a'.

I know that the whole point of these chain code is to be resilient to such tiny changes but I can't figure out how I'm supposed to find which character corresponds to one chain code.

I've been that far and now I'm stuck...

(By the way, I don't need 100% efficiency and things like differentiating '0' from 'O' or from 'o' isn't really an issue)

Maple
  • 741
  • 13
  • 28
SyntaxT3rr0r
  • 27,745
  • 21
  • 87
  • 120
  • You may already have read it, but the description here: http://www.codeproject.com/KB/recipes/OCR-Chain-Code.aspx seems like it gives a good starting point. My take on it would be that you need to 'train' your software by feeding it identified samples, then when it is fed real data, have it identify the 'closest' match. You don't have to be able to state that the input is definitely an 'a', you just have to be able to say that it's closer to an 'a' than any other symbol you're interested in and that it's close enough to an 'a' that you're willing to accept it. – forsvarir Jul 29 '11 at 07:31
  • @forsvarir: thanks for that link, I've read several but that one I hadn't yet. That said I agree with you but it's really choosing the "closest" that's giving me issues. Do you know I should run something like a *"Levenhstein Edit Distance"* to find the closest? That's basically my problem: I don't understand how to pick the closest nor how many inputs I need to feed. – SyntaxT3rr0r Jul 29 '11 at 12:10
  • I don't think a Levenhstein Edit Distance could possibly work: it wouldn't work for a's at different sizes. – SyntaxT3rr0r Jul 29 '11 at 12:55
  • A _Artificial Neural Network_ could provide good results. AFNs are well suited in applications, where small changes on the input side do not change the output. But as I read on the (already mentioned) codeproject site [A C# Project in Optical Character Recognition (OCR) Using Chain Code](http://www.codeproject.com/KB/recipes/OCR-Chain-Code.aspx), also _Support vector machines_, _K nearest neighbor_ and _Euclidean distance_ are possible methods in the classification stage. – Christian Ammer Jul 29 '11 at 20:23
  • You talk about the problem of "a's at different sizes": Why don't you scale the input characters to a uniform size before classification -- maybe by squeezing the chain code to a fixed length? – Christian Ammer Jul 29 '11 at 20:32
  • A Christian Ammer: that's a very interesting idea although these characters are typically very very small (e.g. 5x7 pixels) and I think scaling from, say, 8x6 pixels to 5x7 pixels is probably likely to be problematic I think!? – SyntaxT3rr0r Jul 30 '11 at 12:20
  • Sure this could be problematic. To something else: Could you please share some more chain codes with us, particularly from letters difficult to distinguish. Examples are always a good basis. – Christian Ammer Jul 30 '11 at 20:25

4 Answers4

18

What you need is a function d that measures the distance between chain codes. After then finding the letter to a given chain code is straightforward:

Input:

  • normalized chain codes S for the set of possible letters (generally the cain codes for A-Z, a-z, 0-9, ...)
  • chain code x of a letter which needs to be detected and which could be slightly deformed (the chain code wouldn't match any chain code in the set S)

The algorithm would iterate through the set of possible chain codes and calculate the distance d(x,si) for each element. The letter with the smallest distance would be the output of the algorithm (the identified letter).

I would suggest following distance function: For two chain codes, add up the length differences of each direction: d(x,si) = |x0-si0| + |x1-si1| + .. + |x7-si7|. x0 is the number of 0s in the chain code x, si0 is the number of 0s in the chain code si, etc.

An example will better explain what I'm thinking about. In the following image there are the letters 8, B and D, the fourth letter is a slightly deformed 8, which needs to be identified. The letters are written with Arial with font-size 8. The second line in the image is 10 times enlarged to better see the pixels.

enter image description here

I manually calculated (hopefully correct) the normalized chain codes which are:

8:  0011223123344556756677
B:  0000011222223344444666666666
D:  00001112223334444666666666
8': 000011222223344556756666 (deformed 8)

The length differences (absolut) are:


direction | length         | difference to 8'
          | 8 | B | D |  8'|   8 |  B |  D |
----------+---+---+---+----+-----+----+-----
        0 | 2 | 5 | 4 |  4 |   2 |  1 |  0 |
        1 | 3 | 2 | 3 |  2 |   1 |  0 |  1 |
        2 | 3 | 5 | 3 |  5 |   2 |  0 |  2 |
        3 | 3 | 2 | 3 |  2 |   1 |  0 |  1 |
        4 | 2 | 5 | 4 |  2 |   0 |  3 |  2 |
        5 | 3 | 0 | 0 |  3 |   0 |  3 |  3 |
        6 | 3 | 9 | 9 |  5 |   2 |  4 |  4 |
        7 | 3 | 0 | 0 |  1 |   2 |  1 |  1 |
----------+---+---+---+----+-----+----+-----
                        sum   10 | 12 | 14 |

8' has the smallest distance to the chain code of 8, thus the algorithm would identify the letter 8. The distance to the letter B is not much bigger, but this is because the deformed 8 looks almost like the B.

This method is not scaling invariant. I think there are two options to overcome this:

  • For different font sizes, having different sets of normalized chain codes
  • One set of normalized chain codes at a big size (e.g. 35x46 pixel) and scaling the input letter (which needs to be identified) to this bigger size.

I'm not quite sure if the distance function is good enough for the set of all alphanumeric letters but I hope so. To minimize the error in identifying a letter you could include other features (not only chain codes) into the classification step. And again, you would need a distance measure -- this time for feature vectors.

Christian Ammer
  • 7,464
  • 6
  • 51
  • 108
  • 1
    +1 amazing answer. Yup, I'm already indeed using other features to discard obvious non-matches or to keep obvious possible matches (while being very careful about false positives/false negatives). It works quite well but I'd gladly use the help of the chain code :) – SyntaxT3rr0r Aug 01 '11 at 23:44
  • 1
    that distance function, did you come up yourself with it or do you know it's used by chain code algos? Did you know about these chain codes before? – SyntaxT3rr0r Aug 01 '11 at 23:45
  • I did not know anything about chain codes before. The distance function was my second thought. My first thought was (at viewing at the chain code of the deformed 8) to rotate one chain code until the best match (the most correspondences of the chain code positions between two chain codes) was found. But then a quite simpler solution (lengths of each direction) came into my mind, which should also give good results, so I make it an answer. – Christian Ammer Aug 02 '11 at 07:21
  • 2
    I also did a web search for distance functions. I didn't find such a function but two interesting papers: [Application of Freeman Chain Codes: An Alternative Recognition Technique for Malaysian Car Plates](http://arxiv.org/pdf/1101.1602) and [A Complete Bangla OCR System for Printed Chracters](http://www.uap-bd.edu/jcit_papers/vol-1_no-1/JCIT-100707.pdf). – Christian Ammer Aug 02 '11 at 07:32
  • Why do you have in case of B 0000011 ? When it's only one move diagonally to the right.. and same with D... ? – n32303 Nov 01 '15 at 21:35
  • @NejcLovrencic: Look at the **contour** of the letter. Start at any pixel on this contour and walk counterclockwise. After that, normalized your code (rotate the code till the 0's are first). Then you get 0000011222223344444666666666 for the letter 'B'. – Christian Ammer Nov 10 '15 at 10:55
  • I disagree. If you look at this picture (http://postimg.org/image/rec317ee9/), you can see that if you look at the contour of the letter, there is only one time diagonally up in B, and two times in D. – n32303 Nov 11 '15 at 12:43
  • In case of B: Start at the contour pixel left under the pixel you marked with the red line (you have to imagine the pixel, it's not visible). Then you have one step in direction 1 to get to to the contour pixel which you have marked with the red line. And another step in direction 1 to get to the contour pixel right above the pixel you marked (pixel not visible). I didn't make the contour visible because the original question was not about the generation of *normalized freeman chain code*. – Christian Ammer Nov 11 '15 at 20:51
  • Please, take a look at this question https://stackoverflow.com/questions/44344321/normalized-freeman-chain-codes – Pavel_K Jun 03 '17 at 13:13
3

As your question is not specific enough (whether you want the full algorithm based on the chain code or just some probabilistic classifying), I'll tell you what I know about the problem.

Using the chain code, you can count some properties of the symbol, e.g. the number of rotations of the form 344445, 244445, 2555556, 344446 (arbitrary number of 4s), i.e. the "spikes" on the letter. Say there are 3 sections in the chain code that looks like this. So, this is almost certainly "W"! But this is a good case. You can count numbers of different kinds of rotations and compare that to previously saved values for every letter (which you do by hand). This is quite a good classifier, but alone is not sufficient, of course. It will be impossible for it to differentiate "D" and "O", "V" and "U". And much depends on your imagination.

You should start by creating a test case of images of some letters with a reference and check your algorithm between the changes and inventing new criteria.

Hope this answers your question at least partially.

Update: One bright idea just came into my mind :) You can count the number of monotonic sequences in the chain, for example, for chain 000111222233334443333222444455544443333 (a quick dumb example, doesn't really correspond to any letter) we have
00011122223333444 3333222444455544443333,
00011122223333444 3333222 444455544443333,
000111222233334443333222 4444555 44443333,
0001112222333344433332224444555 44443333,

i.e. four monotonic subsequences.

This should be a good generalization, just count the number of this changes for real letters and compare to that acquired from the detected chain, this is a good try.

Some problems and ideas:

  1. Chain is cyclic in a way, so you should deal with detecting monotony on the ends of the chain (to avoid off-by-one errors),
  2. Some artifacts should be accounted for, for example, if you know that letter is big enough (for example, 20 pixels in height), you would want to ignore monotony interruption shorter than 3 items, for example :)
unkulunkulu
  • 11,576
  • 2
  • 31
  • 49
  • +1, you bet it does! But still: can you elaborate a bit more on how I'd do the counting/lookup? Are there any specific data-structures that would be helpful? I know fully about the 'D' / 'O' / '0' issue and that's not a problem: I don't need 100% accuracy. – SyntaxT3rr0r Jul 30 '11 at 12:22
  • @unkulunkuly: also, is there something special I need to do so that you get the bounty in 6 days? – SyntaxT3rr0r Jul 30 '11 at 12:23
  • @SyntaxT3rr0r, sorry, but I've never implemented such an algorithm, I just remember this idea from some course I listened, the lector mentioned this, but no concrete examples were given. I think we should wait a bit longer for an answer, I would be interested too :) – unkulunkulu Jul 30 '11 at 12:32
  • @SyntaxT3rr0r, I think there's no way to defer the bounty. If you don't present it in time, it will get lost unless someone writes an answer which receives at least +2 in which case she gets half of the bounty. – unkulunkulu Jul 30 '11 at 14:16
  • better not have the bounty lost. How can I make sure you get the bounty if you're the only one to answer? – SyntaxT3rr0r Jul 30 '11 at 14:46
  • @SyntaxT3rr0r, I guess there's no way, searching on meta could be an option. Anyway, longer your question stays on the featured tab, more people views it, so don't be too hasty :) – unkulunkulu Jul 30 '11 at 14:49
  • "I don't know why I see a "please avoid extended discussions in comments"... Maybe SO shouldn't allow content to be archived by search engines because comments, well, they're not answers!? – SyntaxT3rr0r Jul 30 '11 at 22:06
  • @Syntax, it's just because SO is a Q&A site, not a forum and we're chatting in a way :) – unkulunkulu Jul 30 '11 at 22:10
0

Last month, I was dealing with the same problem. Now, I have solved this problem by vetex chain code.

The vetex chain code is the binary chain code. Then, I cut it to 5 parts. Obviously, The number 0-9 has its own charcter in different part.

Stefan
  • 5,203
  • 8
  • 27
  • 51
xu2mao
  • 572
  • 6
  • 8
0

You could convert the chain code into an even simpler model that conveys the topology and then run machine learning code (which one would probably write in Prolog).

But I wouldn't endorse it. People have done/tried this for years and we still have no good results.

Instead of wasting your time with this non-linear/threshold based approach, why don't you just use a robust technique based on correlation? The easiest thing would be to convolve with templates.

But I would develop Gabor wavelets on the letters and sort the coefficients into a vector space. Train a support vector machine with some examples and then use it as a classifier.

This is pretty much how our brain does it and I'm sure its possible in the computer.

Some random chit chat (ignore):

I wouldn't use neuronal networks because I don't understand them and therefore don't like them. However, I'm always impressed by work of Geoff Hintons group http://www.youtube.com/watch?v=VdIURAu1-aU.

Somehow he works on networks that can propagate information backward (deep learning). There is a talk of him where he lets a trained digit recognition network dream. That means he sets one of the output neurons to "2" and the network will generate pictures of things that it thinks are two on the input neurons.

I found this very cool.

whoplisp
  • 2,508
  • 16
  • 19