How to choose the best threshold when removing multicollinear features with hierarchical clustering on Spearman rank correlation?

Asked Jul 10 '23 at 22:53

Active Jul 10 '23 at 22:53

Viewed 15 times

I am reading currently this documentation https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance_multicollinear.html#handling-multicollinear-features in order to handle multicollinearity in a dataset. It says "Next, we manually pick a threshold by visual inspection of the dendrogram to group our features into clusters and choose a feature from each cluster to keep, select those features from our dataset, and train a new random forest." I am not sure how to pick a threshold for different datasets? Is there a default value which should always work or I should somehow understand the dendrogram, or is there any Python implementation which does this automatically?

Thank you!

asked Jul 10 '23 at 22:53

John B

How to choose the best threshold when removing multicollinear features with hierarchical clustering on Spearman rank correlation?

0 Answers0