I want to test the hierarchical clustering with "centroid" and "median" methods. I have the following R code:
library(dendextend)
iris <- datasets::iris
iris2 <- iris[,-5]
species_labels <- iris[,5]
d_iris <- dist(iris2)
hc_iris <- hclust(d_iris, method = "centroid")
dend <- as.dendrogram(hc_iris)
dend <- color_branches(dend, k=3)
plot(dend,
main = "Clustered Iris data set
(the labels give the true flower species)",
horiz = TRUE, nodePar = list(cex = .007))
The number of clusters seems to be larger than the k
in the color_branches
function.
However, if I do the cutree
directly on hc_iris
, which is the result of hierarchical clustering:
table(cutree(hc_iris, k=3), iris$Species)
I get 3 clusters, as expected:
setosa versicolor virginica
50 0 0
0 50 48
0 0 2
But if I apply the cutree
function on the dendrogram, the number of clusters is 34:
table(cutree(as.dendrogram(hc_iris), 3), iris$Species)
setosa versicolor virginica
4 0 0
3 0 0
3 0 0
6 0 0
2 0 0
3 0 0
10 0 0
5 0 0
4 0 0
1 0 0
1 0 0
2 0 0
1 0 0
3 0 0
2 0 0
0 3 0
0 27 9
0 12 0
0 2 0
0 3 0
0 1 3
0 2 0
0 0 9
0 0 3
0 0 6
0 0 2
0 0 3
0 0 3
0 0 3
0 0 2
0 0 2
0 0 2
0 0 1
0 0 2
This happens with both "centroid" and "median" methods.