I am working on line segmentation of cursive text Arabic, Urdu. Text lines are detected properly, by computing density of dark pixels in a row. Consecutive rows having more than threshold pixels are cropped, by using the following code:
%line breaker
divisions = [(MaxPixelsPerLine(1));MaxPixelsPerLine(difference > 10); ];
% use divisions in a loop, for segmenting all lines
line = img(divisions(i):divisions(i+1), :);
output:
In this output, segmented lines contain parts of words from adjacent lines. I want that the point from where line segments (from adjacent line), should not cut overlapping character into two parts. If small connected component (in this case) or dot/diacritics of character, of one line is exceeding to adjacent line, then it should cut properly with respective line (to which it belongs).
This is the desired output:
I don't want another algorithm/technique. How can I modify this algorithm to get desired results?
Thanks.