0

The general way to label leafs in a dendogram is shown here

Since I have a large data set, I want to label higher clusters instead of single data points. For instance if a cluster has 12 data points out of which 7 are from "Lable1", I want to label that cluster as "Label1". In other words, I want to plot a tree with predefined clusters:

LargeDataSet = [...]; % some m x n data matrix
dataLabel = [...]; % m x 1 vector labeling each row of LargeDataSet    
N = 10; % number of clusters I want
tree = linkage(LargeDataSet,'average'); 
LabelVector = ?; % I don't know how to create this vector 
dendrogram(tree,N,'Label',LabelVector);

essentially I want to know how to create "LabelVector" from "tree" such that each label in LabelVector is a label from "dataLabel" that is maximum or most occurring in that cluster.

Thanks for reading all the way through! I know this may not be the best description of my problem.

Community
  • 1
  • 1
hkf
  • 663
  • 2
  • 11
  • 23

1 Answers1

0

Ok I have figured it out:

LargeDataSet = [...]; % some m x n data matrix
dataLabel = [...]; % m x 1 vector labeling each row of LargeDataSet    
N = 10; % number of clusters I want
tree = linkage(LargeDataSet,'average');     
[H,T,outperm]=dendrogram(tree,N); % H gives the distances and T labels each data point to a cluster
L={};
    for i = 1:N
        A = find(T==i);
        B = H(A);
        [aa,~,cc] = unique(B);dd=mode(cc);
        L = cat(1,L,B(dd(1)));
    end
ind = str2num(get(gca,'XTickLabel'));
set(gca, 'XTickLabel',L(ind))
hkf
  • 663
  • 2
  • 11
  • 23