5

I want to extract characters in a sequence. For example, given this image:

Here's the code I wrote:

[L Ne]=bwlabel(BinaryImage);
stats=regionprops(L,'BoundingBox');
cc=vertcat(stats(:).BoundingBox);
aa=cc(:,3);
bb=cc(:,4);
hold on
figure
for n=1:size(stats,1)
    if (aa(n)/bb(n) >= 0.2 && aa(n)/bb(n)<= 1.25)
        [r,c] = find(L==n);
        n1=BinaryImage(min(r):max(r),min(c):max(c));
        imshow(~n1);
        pause(0.5)
    end
    hold off
end

What changes should I make for a proper sequence?

Nomi
  • 67
  • 5

1 Answers1

2

regionprops operates by looking for blobs in column-major order. regionprops does not operate in row-major order, which is what you are looking for. The column-major ordering originates from MATLAB itself as operating in column-major order is the native behaviour. In addition, your logic using find / bwlabel also operates in column-major format so you will have to keep both of these things in mind when trying to display your characters in row-major format.


As such, a simple way is to modify your for loop so that way you access the structure row-wise instead of column-wise. For your example image, the ordering of characters is delineated is like so:

 1   3   5
 2   4   6

You would need to access the structure in the following order: [1 3 5 2 4 6]. Therefore, you would change your for loop to access this new array and you can create this new array like so:

ind = [1:2:numel(stats) 2:2:numel(stats)];

Once you do that, just modify your for loop to access the values in ind instead. To fully make your code reproducible, I'm going to read your image directly from StackOverflow and invert the image as the text is black. The text needs to be white for the blob analysis to be successful:

%// Added
clear all; close all;
BinaryImage = ~im2bw(imread('http://s4.postimg.org/lmz6uukct/plate.jpg'));

[L Ne]=bwlabel(BinaryImage);
stats=regionprops(L,'BoundingBox');
cc=vertcat(stats(:).BoundingBox);
aa=cc(:,3);
bb=cc(:,4);
figure;

ind = [1:2:numel(stats) 2:2:numel(stats)]; %// Change
for n = ind %// Change
    if (aa(n)/bb(n) >= 0.2 && aa(n)/bb(n)<= 1.25)
        [r,c] = find(L==n);
        n1=BinaryImage(min(r):max(r),min(c):max(c));
        imshow(~n1);
        pause(0.5)
    end
end

Warning

The above code assumes that there are only two rows of characters. If you have more, then it is obvious that the indices specified will not work.

If you want it to work for multiple lines, then this logic I'm going to write assumes that the text is horizontal and not on an angle. Simply put, you'd loop until you run out of structures and at the beginning of the loop, you would search for blob that has the smallest (x,y) coordinate of the top-left corner of the blob that we didn't process. Once you find this, you search for all y coordinates that are within some threshold of this source y coordinate and you'd grab the indices at these locations. You'd repeat this until you run out of structures.

Something like this:

thresh = 5; %// Declare tolerance

cc=vertcat(stats(:).BoundingBox);
topleft = cc(:,1:2);

ind = []; %// Initialize list of indices
processed = false(numel(stats),1); %// Figure out those blobs that have been processed
while any(~processed) %// While there is at least one blob to look at...
    %// Determine the blob that has the smallest y/row coordinate that's  
    %// unprocessed
    cc_proc = topleft(~processed,:);
    ys = min(cc_proc(:,2));

    %// Find all blobs along the same row that are +/-thresh rows from
    %// the source row
    loc = find(abs(topleft(:,2)-ys) <= thresh & ~processed);

    %// Add to list and mark them off
    ind = [ind; loc];
    processed(loc) = true;
end

ind = ind.'; %// Ensure it's a row

You'd then use the ind variable and use it with the for loop just like before.

rayryeng
  • 102,964
  • 22
  • 184
  • 193
  • Thanks. It worked. but I cant understand the meaning of 'ind = [1:2:numel(stats) 2:2:numel(stats)];'.. numel just calculate the total number of pixels. How it change column major order to row major order? – Nomi Aug 25 '15 at 17:53
  • Read the post carefully. If you want row major order, you have to access the structure elements in the order of `[1 3 5 2 4 6]`. That statement generates the indices that way. `1:2:numel(stats)` creates a vector starting from 1 up to as many elements as you have skipping over 2. `2:2:numel(stats)` is the same thing but we start at 2. Try looking at the vector in MATLAB yourself and see that it does work. However, let me update the post so that you can adapt this to as many characters as necessary. This post currently assumes there are two rows of characters. – rayryeng Aug 25 '15 at 17:55
  • It worked fine when upper line contain 3 letters and lower line contain 3 digits. But I have certain images with 4 digits in lower line. In such cases it detect the lower line first then detect upper line. images like http://s10.postimg.org/4ww3bjia1/plate.png and http://s10.postimg.org/4ww3bjia1/plate.png – Nomi Aug 25 '15 at 19:27
  • All my images have 2 lines only.. And in addition sometimes it detects randomly (not in sequence). Randomly detect images have some orientation. Maybe this will cause that problem. – Nomi Aug 25 '15 at 19:32
  • @Nomi That's weird... for both of those images, it detected the top level first with the new code I wrote. However, what I have written assumes flat images only without any orientation. If you go with the original post I had with the `[1 3 5 2 4 6]` method, then you are correct in that it will detect the bottom row first. However, with the new code I wrote, it's fine. – rayryeng Aug 25 '15 at 20:43
  • @Nomi - Please make sure you read the edit that has new code at the bottom. Once you run this, you'd use this with the `for` loop at the end. – rayryeng Aug 25 '15 at 20:48
  • Yeah I got it right. With the new edited code indices rightly find the new coordinates left to right. – Nomi Aug 25 '15 at 20:57
  • @Nomi - Great! Glad I could help. – rayryeng Aug 25 '15 at 21:04
  • 1
    Thanks for your effort! – Nomi Aug 25 '15 at 21:11
  • @reyryand. All images are working well. Now I want to remove non characters OR characters with smallest size. – Nomi Aug 29 '15 at 11:05
  • @reyryang. I think this should be done on the basis of characters difference. If difference between any character is large as compared to rest of characters difference it should not be considered. Like in this image s10.postimg.org/4ww3bjia1/plate.png how to remove extra character? Hope you will understand. – Nomi Aug 29 '15 at 11:08
  • And I want to remove extra region / small character when image has 2 rows. I don't want to remove anything it in single line row. – Nomi Aug 29 '15 at 11:11
  • For example from this image I want to remove http://s17.postimg.org/5quw8e6m7/plate2.png digit 5. – Nomi Aug 29 '15 at 11:20
  • @Nomi That's a separate question. Please consider asking another question. – rayryeng Aug 29 '15 at 14:06