0

I am working on line segmentation of cursive text Arabic, Urdu. Text lines are detected properly, by computing density of dark pixels in a row. Consecutive rows having more than threshold pixels are cropped, by using the following code:

%line breaker

divisions = [(MaxPixelsPerLine(1));MaxPixelsPerLine(difference > 10); ]; 

% use divisions in a loop, for segmenting all lines

line = img(divisions(i):divisions(i+1), :); 

output:

image1 image2

In this output, segmented lines contain parts of words from adjacent lines. I want that the point from where line segments (from adjacent line), should not cut overlapping character into two parts. If small connected component (in this case) or dot/diacritics of character, of one line is exceeding to adjacent line, then it should cut properly with respective line (to which it belongs).

This is the desired output:

image3

I don't want another algorithm/technique. How can I modify this algorithm to get desired results?

Thanks.

saud00
  • 465
  • 1
  • 4
  • 13
  • It looks like the descenders don't just cross the cut, they actually share vertical space with ascenders from the line below. Straight lines (even angled) cannot give perfect separations in this scenario. I think you could even produce examples where a descender actually intersects the ascenders of the line below. – Ben Voigt Apr 17 '18 at 22:35
  • In other words, you need to include the band where the ascenders and descenders exist into BOTH lines. – Ben Voigt Apr 17 '18 at 22:38
  • I attach one more output image, in which descenders are cut off to next line. This is what, I want to tackle with your help. – saud00 Apr 18 '18 at 00:43
  • Is it possible that, when break points are selected to segment line,after that during segmenting line. When inverted pixels(text) come on a way, than instead of cutting line straight, cut around the contour of that character(whether is it exceding character or dot/diacritics) – saud00 Apr 18 '18 at 00:52
  • As long as they don't actually intersect, then yes you can try to follow contours. You are no longer talking about a dividing line though, but a complex curve. – Ben Voigt Apr 18 '18 at 00:53
  • I edited post, as what is desired output, hope so you can now easily understand. – saud00 Apr 18 '18 at 01:07
  • No! the dividing line was first priority, which accomplished. Now From contours, i mean that when any dark pixel(text) come in a way of "row/break_points" which is selected for partitioning line, sholud segment along side of that exceeded characters and than continue that row/break_points to segment. – saud00 Apr 18 '18 at 01:17
  • Possible duplicate of [Is there an efficient algorithm for segmentation of handwritten text?](https://stackoverflow.com/questions/8015001/is-there-an-efficient-algorithm-for-segmentation-of-handwritten-text) – John Apr 18 '18 at 01:39
  • For your newest image, you have connected descenders. For these I think it would be sufficient to apply a flood-fill algorithm (start from the cut line and follow strokes below as long as the pixels form a contiguous region. I guess for your first image, that won't be good enough because there are some "islands" of disconnected strokes. To show you how difficult this problem is, there are a few strokes where I can't tell whether they belong with the line above or below (I don't know this script). You may need an algorithm with knowledge of the glyph shapes. – Ben Voigt Apr 18 '18 at 01:42

0 Answers0