I am working on a dataset that has 20.000 variables. Those variables are measured using the same unit meassurement but since it is a very large number, I decided to cluster the variables to obtain groups of somehow related variables.
I decided that a good option was applying hierarchical clustering, and I used the following code (assume D is the data frame):
d <- dist(D, method = "euclidean")
clust1 <- hclust(d, method="ward.D")
plot(clust1)
groups <- cutree(fit, k=150)
The dendogram I obtained is the following:
As you can see, the name of the variables makes it very hard to see something useful here, but I actually dont know how to do so that R does not display variable names on the dendogram.
I also have another question: I used the order "cutree" to build the gropus, but as discovered, this order has a limitation, and can only build as much as 150 gropus. ¿Is there any other way to build the groups without this limitation?
Thank you very much
PD: Any other suggestion about how to group this crazy dataset will be well recieved