0

Say you have the following matrix:

    V1  V2  V3  V4  V5
1   0   0   0   0   1
2   0   0   0   1   1
3   0   0   1   1   1
4   0   0   1   1   0
5   1   0   0   0   0
6   1   1   1   0   0
7   0   1   1   0   0
8   0   1   1   0   0
9   0   1   1   1   0
10  1   1   1   0   1

and you do a dendrogram say whatever way you want but here is what I did, where cmat is the above custom matrix:

distance <- dist(cmat, method="euclidean")
cluster <- hclust(distance, method="average")
plot(cluster, hang=-1)

Cluster Dendrogram on Custom Matrix "cmat"

Basically I want to know what features cause what breaks. Say if we are clustering above 1.5, and we can view this by using the code:

dnd = as.dendrogram(cluster)
plot(cut(dnd, h=1.5)$upper, main="Upper tree of cut at h=1.5")

and these produces: Upper tree cut of dendrogram at h=1.5

But notice how it has an arbitrary name "batch" .... I want to know:

Which feature of the 5 cause that first break? Then the next? Any Ideas? How to code this in? thx!!

StudentOfScience
  • 809
  • 3
  • 11
  • 35
  • `pts <- identify(cluster); pts` and then looking at the selected parts of `pts` like `pts[[1]]` etc. might actually help here. Besides, this book here: http://www.amazon.com/Finding-Groups-Data-Introduction-Probability/dp/0471735787 explains all the magic behind. Don't get confused by old computer printouts in this book. To me it's very well written and REALLY explains what happens. just like looking at the source code which would be also be an option.. – Matt Bannert Oct 18 '13 at 08:00
  • my matrix was huge, it crashed it using identify() :( – StudentOfScience Oct 18 '13 at 08:50

1 Answers1

0

Short answer: all of them. Euclidean distance is defined as sqrt(sum((x-y)**2)), so all features are used to compute the distances. This is NOT a decision tree that splits on a single feature.

If you want some simple explanation like a decision tree, I suggest that you

  1. Produce a flat clustering by cutting the tree at the desired height

  2. Train a decision tree on the resulting clusters

  3. Analyze the decision tree.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194