0

I have plotted a dendrogram using maximum agglomeration method.

hc <- hclust(distance_matrix, method = "complete")
plot(hc, hang = 0, labels=ilpd_df$Class)

Q1) How can I find the accuracy of this agglomeration method?

Q2) How should one comment on the sensitivity of test data to the agglomeration method?

Thank you =)

  • I don't think that either 'accuracy' or 'sensitivity' are meaningful words in this context. However if you want to measure the appropriateness of a hierarchical clustering then the simplest method maybe the `cophenetic` function. See `help(cophenetic)` – Stephen Henderson Oct 10 '19 at 13:49
  • This doesn't appear to be a specific programming question that's appropriate for Stack Overflow. If you have general questions about the appropriate use of various statistical methods, then you should ask such questions over at [stats.se] instead. You are more likely to get better answers there. – MrFlick Oct 10 '19 at 14:46
  • @MrFlick please encourage *migration*, not duplication, of questions. If you have enough reputation, you can also flag as off-topic, belongs to, CV. – Has QUIT--Anony-Mousse Oct 12 '19 at 11:11

1 Answers1

1

Cluster analysis is explorative, not predictive.

Accuracy makes sense when predicting, but not so much when exploring data. You won't be able to just apply this clustering method to a new data point!

The closest to accuracy is probably the Rand index if you actually have labeled data. It's the accuracy of predicting for a pair of points if they have the same label, or not.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194