5

I am very curious about making a handwriting recognition application in a web browser. Users draw a letter, ajax sends the data to the server, neural network finds the closest match, and returns results. So if you draw an a, the first result should be an a, then o, then e, something like that.

I don't know much about neural networks. What kinda data would I need to pass to the NN. Could it be an array of the x/y coordinates where the user has drawn on a pad. Or what type of data is the neural network expecting or would produce the best results for handwriting?

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
thenengah
  • 42,557
  • 33
  • 113
  • 157

3 Answers3

2

Commonly, simple NNs for image/handwriting recognition take a 2-d boolean matrix as input; i.e., a black-and-white bitmap. Make sure you have a training set of these available; or let the user train the algorithm using online backprop learning.

@FrustratedWithFormsDesigner's suggestion of also sending the order could make the NN a lot "smarter", but if you're just learning, try the bitmap version first and see how well it works. Also, play with the bitmap granularity. Maybe try digit recognition first, there are standard datasets for that problem on the web.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • can you guys recommend some resources that you have used, if you have, for getting started in this type of project :) – thenengah Jan 06 '11 at 16:47
1

Not only would you need to send the X/Y coordinates, but also the ORDER they were drawn in. So a path might be better than just a set of points. A neural net should be able to handle it, and there are many ways it could. One way might be to divide up the path into n segments for n neurons and have each neuron recognize a piece of the letter.

FrustratedWithFormsDesigner
  • 26,726
  • 31
  • 139
  • 202
  • can you guys recommend some resources that you have used, if you have, for getting started in this type of project :) – thenengah Jan 06 '11 at 16:45
  • what do you mean n segments for n neurons – thenengah Jan 06 '11 at 16:46
  • I admit I have never actually attempted this with neural networks, but if you give me some time I might be able to dig out some old links from the dusty parts of my bookmarks file. In the meantime, start googling things like "intro to neural networks" or "neural network tutorial" to get started. Do you have an AI book that has any chapters on neural nets? – FrustratedWithFormsDesigner Jan 06 '11 at 16:47
  • @Sam: A neural network will have some number of neurons at the input layer, so the easiest thing to do would be to break the path into enough segments so that each neuron will try to recognize a single segment. If the path is very short, it's possible that you may have < *n* segments. I reiterate, I have never attempted this, I just came up with it now. ;) – FrustratedWithFormsDesigner Jan 06 '11 at 16:48
  • lol, looking forward to the links! This is a really complicated subject not only because it is conceptually hard but it's also relatively 'new' although it was in use many years ago. I mean the application part in terms of 'new'. But I have searched this for quite some time. Lot's of thesis and stuff explaining the conceptual part, but as a programmer I'm looking for application which 'dem scientists don't usually provide ;) – thenengah Jan 06 '11 at 16:49
  • @Sam: Ah, are you looking for an API which you could simply pass some input to and get an answer saying "this is the letter E"? – FrustratedWithFormsDesigner Jan 06 '11 at 16:50
  • yes, and no. Anything like a tutorial or with a good explanation. I'm actually doing this for Chinese Characters so the process is much harder than the Roman system and I sure I will have to build a custom system to deal with 5,000 characters (my limit :) although there are many many more. – thenengah Jan 06 '11 at 16:55
  • 1
    The larger the data-set it have to recognize, the bigger the neural network have to be, and the longer it takes to train it. For something as complex as CJK-ideograms, I recommend breaking it up into sub-tasks; Such as recognizing each radical, and then another network/layer to recognize the full characters. – Markus Jarderot Jan 10 '11 at 19:53
  • @FrustratedWithFormsDesigner I have a similar task (gps coordinates) and I am looking for information for a couple of days now. The most interesting thing I found so far is that you mentioned using a "path". I wonder how one could do that? Do you have any more information on that? (I don't have the same amount of gps coordinates for each sample - which is one of my difficulties). – Verena Haunschmid Mar 22 '15 at 08:41
  • @ExpectoPatronum: By "path", I really just meant that the coordinate should be stored/used with the order in which they were processed. A path might look like this `{ [[x1,y1],n1], [[x1,y2],n2], ...}` where the `x*` and `y*` are the x/y coordinates, and the `n*` value is the sequence in which that coordinate was received/processed, example: `{ [[45.123, 10.565], 0], [[43.223, 11.563], 1], ...}`. – FrustratedWithFormsDesigner Mar 23 '15 at 14:45
1

The basic process is to accumulate a number of examples of each letter to be identified, pre-process the raw data, train a collection of candidate models and choose a final model based on test performance on a separate, holdout set of data.

The nature of the pre-processing will depend on the data you collect. If it is "connect the dots" pen movement data, then it may be simplest to divide the image into regions, and summarize by the number of dots per region. If, instead, you are recording a raster image, other pre-processing would be useful, such as simple statistics and vertical and horizontal projection profiles (row and column averages).

"Dr. Dobb's Journal" ran a handprinting recognition contest some years ago (using electronic ink data). You can read about it here:

http://www.drdobbs.com/184408743;jsessionid=IG5ALGCW1HZZVQE1GHPCKH4ATMY32JVN?pgno=4

...and here:

http://www.drdobbs.com/184408923;jsessionid=IG5ALGCW1HZZVQE1GHPCKH4ATMY32JVN?pgno=2

Predictor
  • 984
  • 6
  • 9