Is there an efficient algorithm for segmentation of handwritten text?

Question

I want to automatically divide an image of ancient handwritten text by lines (and by words in future).

The first obvious part is preprocessing the image...

I'm just using a simple digitization (based on brightness of pixel). After that I store data into two-dimensional array.

The next obvious part is analyzing the binary array.

My first algorithm was pretty simple - if there are more black pixels in a row of the array than the root-mean-square of Maximum and Minimum value, then this row is part of line.

After forming the list of lines I cut off lines with height that is less than average. Finally it turned out into some kind of linear regression, trying to minimize the difference between the blank rows and text rows. (I assumed that fact)
My second attempt - I tried to use GA with several fitness functions. The chromosome contained 3 values - xo, x1, x2. xo [-1;0] x1 [0;0.5] x2 [0;0.5]

Function, that determines identity the row to line is (xo + α1 x1 + α2 x2) > 0, where α1 is scaled sum of black pixels in row, α2 is median value of ranges between the extreme black pixels in row. (a1,a2 [0,1]) Another functions, that I tried is (x1 < α1 OR x2 > α2) and (1/xo + [a1 x1] / [a2 x2] ) > 0 The last function is the most efficient. Results with GA The fitness function is (1 / (HeigthRange + SpacesRange)

Where range is difference between maximum and minimum. It represents the homogeneity of text. The global optimum of this function - the most smooth way to divide the image into lines.

I am using C# with my self-coded GA (classical, with 2-point crossover, gray-code chromosomes, maximum population is 40, mutation rate is 0.05)

Now I ran out of ideas how to divide this image into lines with ~100% accuracy.

What is the efficient algorithm to do this?

UPDATE: Original BMP (1.3 MB)

UPDATE2: Improved results on this text to 100% Nev results

How I did it:

fixed minor bug in range count
changed fitness function to 1/(distancesRange+1)*(heightsRange+1))
minimized classifying function to (1/xo + x2/range) > 0 (points in row now don't affect classification) (i.e. optimized input data and made fitness function optimizations more explicit)

Problem:

Problem

GA surprisingly failed to recognize this line. I looked at debug data of 'find rages' function and found, that there is too much noise in 'unrecognized' place. The function code is below:

public double[] Ranges()
{
    var ranges = new double[_original.Height];

    for (int y = 0; y < _original.Height; y++ )
    {
        ranges[y] = 0;
        var dx = new List<int>();
        int last = 0;
        int x = 0; 

        while (last == 0 && x<_original.Width)
        {
            if (_bit[x, y])
                last = x;
            x++;
        }

        if (last == 0)
        {
            ranges[y] = 0;
            continue;
        }

        for (x = last; x<_original.Width; x++)
        {
            if (!_bit[x, y]) continue; 

            if (last != x - 1)
            {
                dx.Add((x-last)+1);
            }
            last = x;
        }
        if (dx.Count > 2)
        {
            dx.Sort();
            ranges[y] = dx[dx.Count / 2];
            //ranges[y] = dx.Average();
        }
        else
            ranges[y] = 0;
    }

    var maximum = ranges.Max();
    for (int i = 0; i < ranges.Length; i++)
    {
        if (Math.Abs(ranges[i] - 0) < 0.9)
            ranges[i] = maximum;
    }
    return ranges;
}

I'm using some hacks in this code. The main reason - I want to minimize the range between nearest black pixels, but if there are no pixels, the value becomes '0', and it becomes impossible to solve this problem with finding optimas. The second reason - this code is changing too frequently. I'll try to fully change this code, but I have no idea how to do it.

Q:

If there is more efficient fitness function?
How to find more versatile determination function?

I know that SIFT has been used successfully in handwritten text segmentation but I have no hands on experience. — pnezis, Nov 04 '11 at 20:11
I'm a algo newbi, but I think I found some sites which discussed using hidden markov models to do text recognition. If it can recognize text, maybe it can also recognize spaces/new words... — Lostsoul, Nov 04 '11 at 20:49
I found this link with some code..doesn't do exactly what you want but may give you an idea and then you can modify it for your needs. http://www.codeproject.com/Articles/69647/Hidden-Markov-Models-in-C — Lostsoul, Nov 04 '11 at 21:11
Please post an image of the clear text (without your processing marks) so we can play a little — Dr. belisarius, Nov 05 '11 at 04:28
Also, I am not sure about the problem with your second algorithm, besides the fact that the last short line is not recognized — Dr. belisarius, Nov 05 '11 at 05:12
Updated the post with the link to original image. The main problem with GA is to find good fitness function and trigger-function, that determines identity of row to line. — Ernado, Nov 05 '11 at 09:32
I think this is the wrong place to post this question. Hand writing recognition is a vast research topic with a lot of publications. A simple search in scholar.google.com would have helped you far beyond your imagination. You don't need to reinvent the wheel all over again. — scigor, Nov 05 '11 at 09:42
@inf.ig.sh I dont need to recognise this text. Also i can't access any of publications given by scholar.google.com. — Ernado, Nov 05 '11 at 10:16
@Ernado An import part of text recognition is text segmentation. If you click on "versions" you will discover that about 25-30% of the publications can be downloaded as pdf. — scigor, Nov 05 '11 at 10:46
GA will probably never perform 100%. Your results seem pretty good. — ldog, Nov 06 '11 at 03:20
This question would benefit from the [tag:image-segmentation] tag, but I don't presume to know which other tag to jettison to make space. — hippietrail, Sep 08 '14 at 03:54
The 10 Million Question Meta Post led me here. Awesome answer. +1. — rayryeng, Aug 26 '15 at 23:35

Rethunk · Accepted Answer · 2012-01-25T23:03:41.703

Although I'm not sure how to translate the following algorithm into GA (and I'm not sure why you need to use GA for this problem), and I could be off base in proposing it, here goes.

The simple technique I would propose is to count the number of black pixels per row. (Actually it's the dark pixel density per row.) This requires very few operations, and with a few additional calculations it's not difficult to find peaks in the pixel-sum histogram.

A raw histogram will look something like this, where the profile along the left side shows the number of dark pixels in a row. For visibility, the actual count is normalized to stretch out to x = 200.

raw horizontal count

After some additional, simple processing is added (described below), we can generate a histogram like this that can be clipped at some threshold value. What remains are peaks indicating the center of lines of text.

processed horizontal count

From there it's a simple matter to find the lines: just clip (threshold) the histogram at some value such as 1/2 or 2/3 the maximum, and optionally check that the width of the peak at your clipping threshold is some minimum value w.

One implementation of the full (yet still simple!) algorithm to find the nicer histogram is as follows:

Binarize the image using a "moving average" threshold or similar local thresholding technique in case a standard Otsu threshold operating on pixels near edges isn't satisfactory. Or, if you have a nice black-on-white image, just use 128 as your binarization threshold.
Create an array to store your histogram. This array's length will be the height of the image.
For each pixel (x,y) in the binarized image, find the number of dark pixels above and below (x,y) at some radius R. That is, count the number of dark pixels from (x, y - R) to x (y + R), inclusive.
If the number of dark pixels within a vertical radius R is equal or greater to R--that is, at least half the pixels are dark--then pixel (x,y) has sufficient vertical dark neighbors. Increment your bin count for row y.
As you march along each row, track the leftmost and rightmost x-values for pixels with sufficient neighbors. As long as the width (right - left + 1) exceeds some minimum value, divide the total count of dark pixels by this width. This normalizes the count to ensure the short lines like the very last line of text are included.
(Optional) Smooth the resulting histogram. I just used the mean over 3 rows.

The "vertical count" (step 3) eliminates horizontal strokes that happen to be located above or below the center line of text. A more sophisticated algorithm would just check directly above and below (x,y), but also to the upper left, upper right, lower left, and lower right.

With my rather crude implementation in C# I was able to process the image in less than 75 milliseconds. In C++, and with some basic optimization, I've little doubt the time could be cut down considerably.

This histogram method assumes the text is horizontal. Since the algorithm is reasonably fast, you may have enough time to calculate pixel count histograms at increments of every 5 degrees from the horizontal. The scan orientation with the greatest peak/valley differences would indicate the rotation.

I'm not familiar with GA terminology, but if what I've suggested is of some value I'm sure you can translate it into GA terms. In any case, I was interested in this problem anyway, so I might as well share.

EDIT: maybe for use GA, it's better to think in terms of "distance since previous dark pixel in X" (or along angle theta) and "distance since previous dark pixel in Y" (or along angle [theta - pi/2]). You might also check distance from white pixel to dark pixel in all radial directions (to find loops).

byte[,] arr = get2DArrayFromBitamp();   //source array from originalBitmap
int w = arr.GetLength(0);               //width of 2D array
int h = arr.GetLength(1);               //height of 2D array

//we can use a second 2D array of dark pixels that belong to vertical strokes
byte[,] bytes = new byte[w, h];         //dark pixels in vertical strokes


//initial morph
int r = 4;        //radius to check for dark pixels
int count = 0;    //number of dark pixels within radius

//fill the bytes[,] array only with pixels belonging to vertical strokes
for (int x = 0; x < w; x++)
{
    //for the first r rows, just set pixels to white
    for (int y = 0; y < r; y++)
    {
        bytes[x, y] = 255;
    }

    //assume pixels of value < 128 are dark pixels in text
    for (int y = r; y < h - r - 1; y++)
    {
        count = 0;

        //count the dark pixels above and below (x,y)
        //total range of check is 2r, from -r to +r
        for (int j = -r; j <= r; j++)
        {
            if (arr[x, y + j] < 128) count++;
        }

        //if half the pixels are dark, [x,y] is part of vertical stroke
        bytes[x, y] = count >= r ? (byte)0 : (byte)255;
    }

    //for the last r rows, just set pixels to white
    for (int y = h - r - 1; y < h; y++)
    {
        bytes[x, y] = 255;
    }
}

//count the number of valid dark pixels in each row
float max = 0;

float[] bins = new float[h];    //normalized "dark pixel strength" for all h rows
int left, right, width;         //leftmost and rightmost dark pixels in row
bool dark = false;              //tracking variable

for (int y = 0; y < h; y++)
{
    //initialize values at beginning of loop iteration
    left = 0;
    right = 0;
    width = 100;

    for (int x = 0; x < w; x++)
    {
        //use value of 128 as threshold between light and dark
        dark = bytes[x, y] < 128;  

        //increment bin if pixel is dark
        bins[y] += dark ? 1 : 0;    

        //update leftmost and rightmost dark pixels
        if (dark)
        {
            if (left == 0) left = x;    
            if (x > right) right = x;   
        }
    }

    width = right - left + 1;

    //for bins with few pixels, treat them as empty
    if (bins[y] < 10) bins[y] = 0;      

    //normalize value according to width
    //divide bin count by width (leftmost to rightmost)
    bins[y] /= width;

    //calculate the maximum bin value so that bins can be scaled when drawn
    if (bins[y] > max) max = bins[y];   
}

//calculated the smoothed value of each bin i by averaging bin i-1, i, and i+1
float[] smooth = new float[bins.Length];

smooth[0] = bins[0];
smooth[smooth.Length - 1] = bins[bins.Length - 1];

for (int i = 1; i < bins.Length - 1; i++)
{
    smooth[i] = (bins[i - 1] + bins[i] + bins[i + 1])/3;
}

//create a new bitmap based on the original bitmap, then draw bins on top
Bitmap bmp = new Bitmap(originalBitmap);

using (Graphics gr = Graphics.FromImage(bmp))
{
    for (int y = 0; y < bins.Length; y++)
    {
        //scale each bin so that it is drawn 200 pixels wide from the left edge
        float value = 200 * (float)smooth[y] / max;
        gr.DrawLine(Pens.Red, new PointF(0, y), new PointF(value, y)); 
    }
}

pictureBox1.Image = bmp;

Thank you for answering. I cant understood how to calculate R. It is some constant? — Ernado, Jan 17 '12 at 05:01
I need to translate segmentation algorythms to GA when they have some constants, that were obtained empirically, because the GA can increase the percentage of positive segmentations. Sometimes it affect negatively the speed, but not always (like in case of rotating image). — Ernado, Jan 17 '12 at 05:18
You're welcome. Based on your image, I picked an R of 4 pixels. You might test several different values of R. Rather than use some fixed value of the radius, it might be better to determine the vertical distance between the current pixel and the closest dark pixel above it (in the -y direction). — Rethunk, Jan 17 '12 at 05:23
At a rough guess, you might automatically calculate R (the +/- vertical search radius) as some fraction of the median height of unbroken vertical runs of dark pixels. Within the lines of text it appears that many vertical strokes are roughly of the same height. — Rethunk, Jan 17 '12 at 05:32
How to calculate neighboors of points that have y < R or y > (R + ImageHeight)? Just ignore points that are out of range? — Ernado, Jan 18 '12 at 09:16
And at the step 5 - count of dark pixels is the Hystorgram[y] value? Cant implement this step - the count of points with sufficient neighbours is always smaller than the width of line. — Ernado, Jan 18 '12 at 10:07
Multiplyed the Hystorgram[y] by the width of image before dividing, but not sure that i'm doing right. — Ernado, Jan 18 '12 at 10:42
Once you have the raw histogram count, you want to divide that bin's raw sum by the width occupied by the dark pixels in that row. For example, if the first dark pixel is encountered at x = 100, and the last dark pixel in a row is encountered at x = 250, then you normalize the bin by dividing the raw count by the width 150 (= 250 - 100). I also used a minimum value for the width of about 50, I think, to ensure that small strokes don't yield very large bin counts. — Rethunk, Jan 19 '12 at 04:07
For this particular image, I ignore pixels close to the edges. For a more general solution, treat all pixels outside the image as white since we have no information about them. Note that in step 5 you take the total dark pixel count for a row and **divide** by the width from the leftmost to the rightmost dark pixel in that row. That is, Density = (# of dark pixels) / (span of dark pixels from left to right). If you get stuck I can dig out my C# code, add some more comments, and send it to you. — Rethunk, Jan 19 '12 at 04:11
So we have the dark pixels or the pixels with hight neighbours in the some image row? Yes, please, it will be very helpfull. I have no idea how to smooth resulting hystogram also. — Ernado, Jan 20 '12 at 22:04

score 6 · Answer 2 · answered Nov 07 '11 at 01:59

After fiddling around this for a while I found that I simply need to count the number of crossings for each line, that is, a switch from white to black would count as one, and a switch from black to white would increment by one again. By highlighting each line with a count > 66 I got close to 100% accuracy, except for the bottom most line.

Of course, would not be robust to slightly rotated scanned documents. And there is this disadvantage of needing to determine the correct threshold.

Thank you. I'll try this approach soon. GA can do the determination of 'good' segmentation and, hopefully, give 100% accuracy. — Ernado, Nov 07 '11 at 14:40

Jeremy Thompson · Answer 3 · 2015-08-24T12:45:02.040

2

IMHO with the image shown that would be so hard to do 100% perfectly. My answer is to give you alternate idea's.

Idea 1: Make your own version of ReCaptcha (to put on your very own pron site) - and make it a fun game.. "Like cut out a word (edges should all be white space - with some tolerance for overlapping chars on above and below lines)."

Idea 2: This was a game we played as kids, the wire of a coat hanger was all bent in waves and connected to a buzzer and you had to navigate a wand with a ring in the end with the wire through it, across one side to the other without making the buzzer go off. Perhaps you could adapt this idea and make a mobile game where people trace out the lines without touching black text (with tolerance for overlapping chars)... when they can do a line they get points and get to new levels where you give them harder images..

Idea 3: Research how google/recaptcha got around it

Idea 4: Get the SDK for photoshop and master the functionality of it Extract Edges tool

Idea 5: Stretch the image heaps on the Y Axis which should help, apply the algorithm, then reduce the location measurements and apply them on the normal sized image.

edited Aug 24 '15 at 12:45

answered Nov 05 '11 at 04:21

Jeremy Thompson

61,933
36
195
321

Thank you. It must be offline application, so i'll implement your 1-3 ideas, when it will be an online service, that is not demands on segmentation speed. Stretching is interesting idea. I just need a fast segmentation, that could find all of the lines. – Ernado Nov 05 '11 at 09:22
@Ernado Your welcome and thanks for asking such an interesting question here on SO. There are many talented people in this community. I hope you get more replies as this topic interests me. Cheers – Jeremy Thompson Nov 05 '11 at 12:10
While I appreciate the answer, I think sometimes there are valid reasons to use algorithmic-approach to solve certain problems than relying on human-powered approach, especially if problems like these can largely be solved by algorithm alone. – Hao Wooi Lim Nov 06 '11 at 13:18
@Hao Wooi Lim, I agree with you and so would any programmer who uses orthodox methods, but this problem cant largely be solved with an algorithm. That's why IMHO it would be easier to achieve 100% accuracy for this by getting humans to do it. – Jeremy Thompson Nov 07 '11 at 03:52

Is there an efficient algorithm for segmentation of handwritten text?

The first obvious part is preprocessing the image...

The next obvious part is analyzing the binary array.

3 Answers3

Linked