I have a N*N boolean symmetric matrix to show the relationship of each element.
for example matrix
1 0 1
0 1 1
1 1 1
means element 1 has relationship with 1, 3; element 2 has relationship with 2,3, etc.
Now I want to cluster the elements, as the matrix size is large (N=9000), I don't want to use three layer for loop, and want to use union-find algorithm instead.
int labels[N];
static int find(int u){
return u == labels[u] ? u : labels[u] = find(labels[u]);
}
static void myunion(int u,int v){
labels[find(v)] = find(u); // the value of v is always larger than u
}
For the execution code:
for(int i=0;i<size;i++){
for{int j=i+1;j<size;j++){
if(matrix[i][j]==1){
myunion(i,j);// the value of j is always larger than i
}
}
}
The problem is, I want to always use the smallest index as the cluster label, but sometimes my code doesn't use the right label.
For example, element 2, 3, 100 are related. I want the cluster has the label 2, but I got the result of label 100. Could anyone tell me my logic error?
I am not sure whether
int pv = find(v);
int pu = find(u);
if(labels[pv]>=labels[pu]){
labels[pv] = labels[pu];
}
else{
labels[pu] = labels[pv];
}
works, because I'm afraid if, for example, {1,2,3}->label 1 ;{4,6}->label 4 when I call union(3,4), will labels[6] also be modified to 1?