I am attempting to run a cluster analysis (PAM) on a financial dataset with a lot of noise.
There are well over 100 variables, many of which are highly collinear.
Running the clustering algorithm on the entire array of columns is almost nonsensical given the amount of noise and collinearity, and I do not wish to use a PCA because I will end up with components rather than ranges of existing variables for each cluster, which I plan to further analyze.
In assessing the clustering tendency (hopkin's statistic) of a defined group of say 10 variables, I can determine whether clustering is viable. My question is if there is a way to loop the hopkin's statistic across every possible group of say 10 variables, such that I can run the clustering algorithm on the group with the best hopkin's statistic, etc.
I may be way off base with this, but any advice is appreciated.