1

Here are three points and a hierarchical clustering using hclust in R with the "centroid" method.

points <- data.frame(x = c(0, 1, 0.75),
                     y = c(0, 0, 1))
centroid <- hclust(dist(points), method = "centroid")
plot(centroid)

The resulting dendrogram correctly merges the first and second points. (The distance is 1.) The centroid of the first two points is at (0.5, 0).

The third point merges at a height of 0.8903882, creating an inversion (or reversal as some call it). In fact, the third point is at a distance of 1.030776 from the centroid, so there should be no inversion.

What am I missing here?

Sean Raleigh
  • 579
  • 4
  • 10

1 Answers1

0

It is mainly because of the method you have used which is centroid. Choose a different method (monotonic methods) such as Single Linkage Complete Linkage Average Linkage Weighted Average Linkage WARD's Linkage

Aditya Lahiri
  • 409
  • 3
  • 11
  • Hi, @aditya-lahiri. I think perhaps you have misunderstood my question. I realize there are multiple choices for linkage, and I'm not saying that "centroid" is desirable. This question is about why the implementation of the centroid method in hclust appears to be giving the wrong answer. Either there is a bug in the function or else I'm missing something mathematical about the way centroid linkage is computed. – Sean Raleigh Dec 04 '18 at 00:15
  • @SeanRaleigh. The output you are getting is not incorrect and the function has no bugs. The inversions you are seeing is an artifact of using a non-monotonic method such as centroid linkage. This happens because in these types of methods the dissimilarity is not guaranteed to decrease over each iteration unlike monotonic methods. – Aditya Lahiri Dec 04 '18 at 00:29
  • 1
    Sorry, but you're still misunderstanding the question. Maybe I worded it poorly. I know that centroid linkage can cause inversion. I know that centroid linkage is non-monotonic in general. But it's not guaranteed to create inversions. And in this case specifically, it should not. Please do the math by hand and verify that. I repeat "...the third point is at a distance of 1.030776 from the centroid." That is greater than the first distance of 1. – Sean Raleigh Dec 05 '18 at 04:02