0

I have a matrix A composed by 4 vectors (columns) of 12 elements each

A = [    0         0         0         0;
    0.0100    0.0100    0.0100         0;
    0.3000    0.2700    0.2400    0.2400;
    0.0400         0    0.0200    0.0200;
    0.1900    0.0400    0.0800    0.0800;
    0.1600    0.6500    0.2100    0.3800;
    0.0600    0.0100    0.0300    0.0200;
    0.1500    0.0100    0.0600    0.1700;
         0         0         0    0.0800;
    0.0300         0    0.0200    0.0100;
    0.0700         0    0.1200    0.0100;
         0         0    0.2300         0]

I also have a similarity matrix that states how much a vector is similar to the others

SIM =[1.00    0.6400    0.7700    0.8300;
    0.6400    1.0000    0.6900    0.9100;
    0.7700    0.6900    1.0000    0.7500;
    0.8300    0.9100    0.7500    1.0000]

reading the rows of this matrix

vetor 1 is similar to vector 2 for 64%
vector 1 is similar to vector 3 for the 77%
...

I would like to create a dendrogram graph that shows me how many different groups there are in A considering a threshold of 0.95 for similarity (i.e. if 2 groups have a similarity >0.7 connect them)

I didn't really understand how to use this function with my data...

gabboshow
  • 5,359
  • 12
  • 48
  • 98

1 Answers1

1

Not sure I understood correctly you question, but for what I've understood I will do that:

DSIM = squareform(1-SIM); % convert to a dissimilarity vector

it gives the result:

% DSIM =   0.3600    0.2300    0.1700    0.3100    0.0900    0.2500
% DSIM =  1 vs 2 , 1 vs 3 , 1 vs 4, 2 vs 3, 2 vs 4, 3 vs 4 ; 

After, compute the linkage:

Z = linkage (DSIM,'average'); % there is other grouping option than average

You can plot the dendrogram with:

dendrogram(Z)

However, you want to split the groups according to a threshold so:

c = 0.1;

This is the dissimilarity at which to cut, here it means that two groups will be connected if they have a similarity higher than 0.9

T = cluster(tree,'cutoff',c,'criterion','distance')

The result of T in that case is:

T =
  1
  2
  3
  2

This means that at this level your vectors 1, 2, 3, 4 (call them A B C D) are organized in 3 groups:

  1. A
  2. B,D
  3. C

Also, with c = 0.3, or 0.7 similarity:

T = 1 1 1 1

So there is just one group here.

To have that on the dendrogram you can calculate the number of groups:

num_grp = numel(unique(T));

After:

dendrogram(tree,num_grp,'labels',{'A','B','C','D'})

In that case the dendrogram won't display all groups because you set the maximum of nodes equal to the number of groups.

rayryeng
  • 102,964
  • 22
  • 184
  • 193
Pieter V.
  • 134
  • 6