0

I have a dataframe of more than 5000 observations. In my attempt of analysing my data using hierarchical clustering, I have 8 clusters, where some rows contain either a few 1000 or 100 observations.

# Cut tree into 8 groups
cutree_hclust <- cutree(hclust.unsupervised, k = 8)

# Number of members in each cluster
table(cutree_hclust)

cutree_hclust
   1    2    3    4    5    6    7    8 
  486   61  14    3   15    2    9    5 

To get a view of what variable combination there is for each observation in the different clusters, I thought that it might be an idea to make the 8 clusters as dataframes, so I can analyse them separately. This because I have not idea what different rows are in the different columns and therefore don't know what the pattern in the overall datafram (Complete_df) is.

However, how can I make these new dataframes?

I can see what rows are in the different clusters by, fx:

rownames(MY_df)[cutree_hclust == 7]

[1] "65"  "21"  "21"  "70"  "101" "104" "112" "673"
[9] "651"

But if I type

h_clust <- as.dataframe( rownames(MY_df)[cutree_hclust == 7])

I only get a view (as a list) of what rows are in this cluster and all the other columns are not included.

But how can I make this into a dataframe without have to type the row/column sequence with square brackets 5000 times?

BloopFloopy
  • 139
  • 1
  • 2
  • 12
  • Perhaps you need `rn <- rownames(Complete_df)[cutree_hclust == 7]; subset(Complete_df, rn %in% rownames(Complete_df))` – akrun May 05 '18 at 14:28
  • It doesn't seem to work. I did as following, so that i would get a df: rn <- rownames(Complete_df)[cutree_hclust == 7] rn2 <- subset(Complete_df, rn %in% rownames(Complete_df)) However, I get the warning message: Length of logical index must be 1 or 5912, not 9 Also, the rows [cutree_hclust == 7] have not been "subsetted" - I still got a df, rn2, with all 5000 observations – BloopFloopy May 05 '18 at 15:13

0 Answers0