2

I want to test the hierarchical clustering with "centroid" and "median" methods. I have the following R code:

library(dendextend)

iris <- datasets::iris
iris2 <- iris[,-5]
species_labels <- iris[,5]

d_iris <- dist(iris2)
hc_iris <- hclust(d_iris, method = "centroid")

dend <- as.dendrogram(hc_iris)
dend <- color_branches(dend, k=3)

plot(dend, 
    main = "Clustered Iris data set
    (the labels give the true flower species)", 
    horiz =  TRUE,  nodePar = list(cex = .007))

The number of clusters seems to be larger than the k in the color_branches function.

enter image description here

However, if I do the cutree directly on hc_iris, which is the result of hierarchical clustering:

table(cutree(hc_iris, k=3), iris$Species)

I get 3 clusters, as expected:

   setosa versicolor virginica
     50          0         0
     0         50        48
     0          0         2

But if I apply the cutree function on the dendrogram, the number of clusters is 34:

table(cutree(as.dendrogram(hc_iris), 3), iris$Species)
setosa versicolor virginica
   4          0         0
   3          0         0
   3          0         0
   6          0         0
   2          0         0
   3          0         0
  10          0         0
   5          0         0
   4          0         0
   1          0         0
   1          0         0
   2          0         0
   1          0         0
   3          0         0
   2          0         0
   0          3         0
   0         27         9
   0         12         0
   0          2         0
   0          3         0
   0          1         3
   0          2         0
   0          0         9
   0          0         3
   0          0         6
   0          0         2
   0          0         3
   0          0         3
   0          0         3
   0          0         2
   0          0         2
   0          0         2
   0          0         1
   0          0         2

This happens with both "centroid" and "median" methods.

G5W
  • 36,531
  • 10
  • 47
  • 80
zaig
  • 391
  • 1
  • 11
  • 1
    I think what you are doing _should_ work. Bug? but a work-around is this. Change your color_branches statement to `dend <- color_branches(dend, clusters=cutree(hc_iris,3))` – G5W Dec 31 '18 at 13:20

0 Answers0