I have a dataframe of more than 5000 observations. In my attempt to analyse my data using hierarchical clustering, I have 8 clusters, where some of the clusters contain either a few 100 or 1000 individual observations.
# Cut tree into 8 groups
cutree_hclust <- cutree(hclust.unsupervised, k = 8)
# Number of members in each cluster
table(cutree_hclust)
Next is an illustration of the size of each cluster:
cutree_hclust
1 2 3 4 5 6 7 8
867 61 14 310 1135 432 119 5
To get a view of what variable combination there is for each observation in the different clusters, I thought that it might be an idea to make the 8 clusters as dataframes, so I can analyse them separately. This because I have not idea what different rows are in the different columns and therefore don't know what the pattern in the overall datafram (Complete_df) is.
However, how can I make these new dataframes?
I can see what I assume to be the rows in the different clusters by, fx:
rownames(MY_df)[cutree_hclust == 7]
[1] "60" "72" "92" "97" "110" "210" "211" "267"
[9] "565"
But if I type:
h_clust <- as.dataframe( rownames(MY_df)[cutree_hclust == 7])
I only get a view (as a list) of what rows are in this cluster and all the other columns are not included.
How can i select these specific rows in my dataframe called: Complete_df - so that I can see what the overall variable combination is for each cluster?
I have tried the following:
rn <- rownames(MY_df)[cutree_hclust == 7]; subset(Complete_df, rn %in% rownames(MY_df))
- this from: R how to select several rows to make a new dataframe
and
Clust_7 <- rownames(MY_df)[cutree_hclust == 7]
Clust_7_df <- data.frame(matrix(unlist(Clust_7), nrow=9, byrow=T))
The above attempst did not work.
I look forward to hearing back from anyone who can help - as I have not been able to figure this out for myself :-)