Implementing the CCL (Connected Component Labeling) algorithm without the union-find data structure?

Question

Hello fellow programmers!

A week ago I have been asigned the task of implementing the Connected Components Algorithm, mainly to extract the number of objects from an image.

You can read more about the algorithm here (https://en.wikipedia.org/wiki/Connected-component_labeling), the variant I am trying to implement is the two pass one.

This is my current attempt:

% ------------------------------------------------------------------------------
% -> Connected Component Labeling (CCL) Algorithm
% -> 4-Connectivity Version
% ------------------------------------------------------------------------------

% ------------------------------------------------------------------------------
% - [ Pre-Scan Code To Get Everything Ready ] -
% ------------------------------------------------------------------------------
% Starting With A Clean (Workspace) And (Command Window).
clear, clc;

% Instead Of Loading An Actual Image, We Are Creating A Matrix Of Zeros And Ones, Representing A Binary Image.
originalImage = [ ...
                0 1 0
                1 0 1
                0 1 0 ];

% Creating A Bigger Matrix That We Will Use To Store The Original Image In Its Middle, This Will Help Us Eliminate Border Checking In The Raster Scan.
binaryImage = zeros(size(originalImage) + 2);

% Copying The Pixels From The Original Image Into The Middle Of The Larger Matrix We Created.
binaryImage(2:size(originalImage, 1) + 1, 2:size(originalImage, 2) + 1) = originalImage;

% Getting The Number Of Rows (Height) And Number Of Columns (Width) Of The Binary Image.
[imageRows, imageColumns] = size(binaryImage);

% Creating A Matrix The Same Dimensions As The Binary Image In Which The Labeling Will Happen.
labeledImage = zeros(imageRows, imageColumns);

% Creating A Label Counter That We Will Use To Assign When We Create New Labels.
labelCounter = 1;

% ------------------------------------------------------------------------------
% - [First Scan: Assigning Labels To Indices] -
% ------------------------------------------------------------------------------
% Going Over Each Row In The Image One By One.
for r = 1:imageRows
   % Going Over Each Column In The Image One By One.
   for c = 1:imageColumns
       % If The Pixel Currently Being Scanned Is A Foreground Pixel (1).
       if (binaryImage(r, c) == 1)
           % Since We Are Working With 4-Connectivity We Only Need To Read 2 Labels, Mainly The (Left) And (Top) Labels.
           % Storing Them In Variables So Referencing Them Is Easier.
           left = labeledImage(r, c - 1);
           top = labeledImage(r - 1, c);
           % If Left == 0 And Top == 0 -> Create A New Label, And Increment The Label Counter, Also Add The Label To The Equivalency List.
           if (left == 0 && top == 0)
               labeledImage(r, c) = labelCounter;
               labelCounter = labelCounter + 1;
           % If Left == 0 And Top >= 1 -> Copy The Top Label.
           elseif (left == 0 && top >= 1)
               labeledImage(r, c) = top;
           % If Left >= 1 And Top == 0 -> Copy The Left Label.
           elseif (left >= 1 && top == 0)
               labeledImage(r, c) = left;
           % If Left >= 1 And Top >= 1 -> Find The Minimum Of The Two And Copy It, Also Add The Equivalent Labels To The Equivalency List, So We Can Fix Them In The Second Scan.
           elseif (left >= 1 && top >= 1)
               labeledImage(r, c) = min(left, top);
           end
       end
   end
end

% ------------------------------------------------------------------------------
% - [Second Scan: Fixing The Connected Pixels But Mismatched Labels] -
% ------------------------------------------------------------------------------
for r = 1:imageRows
   for c = 1:imageColumns

   end
end

This first pass is going through without any issues, I have tried multiple tests on it, however I have no idea how to implement the second pass, in which I have to fix the equivalent labels in the labeled matrix.

I did do my research online, and the preferred way to do it is to use the union-find (disjoint set) data structure to store the equivalences between the labels.

However, since I am using MATLAB and the union-find data structure is not implemented, I have to implement it myself, which is cumbersome and requires massive time and hard work due to MATLAB being an interpreted language.

So, I am open to ideas on implementing the second pass without having to use the union-find data structure.

Thanks in advance!

I see this question is very similar, though not identical, to [your previous one](https://stackoverflow.com/questions/58890281/trying-to-implement-connected-component-labeling-in-matlab). That one didn’t have any answers and so it would have been appropriate to modify it to ask your updated question. The old question is no longer relevant, so either modifying it or deleting it would be appropriate. Do note that you will be limited by the system if you repeatedly ask questions and delete them, so I recommend that you don’t ask a new question in the hopes of getting more attention. — Cris Luengo, Nov 17 '19 at 21:03
Union-find is quite easy to implement though. See [here](https://www.crisluengo.net/archives/913) for a simple description. — Cris Luengo, Nov 17 '19 at 21:07
I assume there's a reason you're not using the connected component labeling functions in MATLAB, correct? — beaker, Nov 18 '19 at 18:35
@beaker It's an educational project, doing it just to learn how everything works together. — saad.sawash, Nov 18 '19 at 18:37

score 0 · Answer 1 · answered Nov 17 '19 at 21:04

0

If you don't want to use union-find, then you should really do the flood-fill-each-component algorithm (the other one in Wikipedia).

Union-find is neither difficult nor slow, however, and it's what I would use to solve this problem. To make a simple and fast implementation:

Use a matrix of integers the same shape as your image to represent the sets -- each integer in the matrix represents the set of the corresponding pixel.
An integer x represents a set as follows: If x < 0, then it is a root set of size -x. If x>=0 then it's a child set with parent set x (i.e., row x/width, column x%width)). Initialize all sets to -1 since they all start off as roots with size 1.
Use union-by-size and path compression

answered Nov 17 '19 at 21:04

Matt Timmermans

53,709
3
46
87

How can you do path compression with the forest structure implemented this way? I would use an auxiliary array for that, such that each label is represented once and path compression is trivial. – Cris Luengo Nov 17 '19 at 21:10
It's very easy, since a set can not be both a child and a root at the same time, and you only need the size of roots. I've implemented this many times. You can find one over here, although this is a similar question with additional constraints (and the representation is a little different in an arbitrary way): https://stackoverflow.com/questions/55284043/algorithm-use-union-find-to-count-number-of-islands/55286369#55286369 – Matt Timmermans Nov 17 '19 at 21:13

Implementing the CCL (Connected Component Labeling) algorithm without the union-find data structure?

1 Answers1