0

While stats::cutree() takes an hclust-object and cuts it into a given number of clusters, I'm looking for a function that takes a given amount of elements and attempts to set k accordingly. In other words: Return the first cluster with n elements.

For example: Searching for the first cluster with n = 9 objects.

library(psych)
data(bfi)
x <- bfi 
hclust.res <- hclust(dist(abs(cor(na.omit(x)))))
cutree.res <- cutree(hclust.res, k = 2)
cutree.table <- table(cutree.res)
cutree.table

# no cluster with n = 9 elements
> cutree.res
 1  2 
23  5 

while k = 3 yields

cutree.res <- cutree(hclust.res, k = 3)

# three clusters, whereas cluster 2 contains the required amount of objects
> cutree.table
cutree.res
 1  2  3 
14  9  5 

Is there a more convenient way then iterating over this?

Thanks

Comfort Eagle
  • 2,112
  • 2
  • 22
  • 44

1 Answers1

0

You can easily write code for this yourself that only does one pass over the dendrogram rather than calling cutter in a loop.

Just execute the merges one by one and note the cluster sizes. Then keep the one that you "liked" the best.

Note that there might be no such solution. For example on the 1 dimensional data set -11 -10 +10 +11, cutting the dendrogram in merge order will return clusters with 1,2, or 4 elements only. So you'll have to handle this case, too.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194