-2

During feature selection (after doing extensive feature engineering), is there any set of rules that govern which features to drop and which to keep ? I know that highly correlated features should be dropped or merged into newer features, however I have also seen people drop columns that doesn't have high correlation with other features but have significant higher correlation (> 0.8) to the target variable. Why is that so ? Any other tips and help is appreciated.

P.S. : I know this question is pretty broad and isn't exactly on a very specific topic.

Prune
  • 76,765
  • 14
  • 60
  • 81
SOURIN ROY
  • 21
  • 1
  • 5
  • I’m voting to close this question because it is off-topic. Please consider [CrossValidated](https://stats.stackexchange.com) or [Data Science](https://datascience.stackexchange.com). However, I am not voting to auto-migrate because the question is also too broad. – merv Sep 04 '20 at 04:00
  • Please repeat [on topic](https://stackoverflow.com/help/on-topic) and [how to ask](https://stackoverflow.com/help/how-to-ask) from the [intro tour](https://stackoverflow.com/tour). Most notably, we expect you to do appropriate research before posting a question here, and your central issue is covered in quite a few on-line resources. Also, I removed the superfluous "Python" tag and the redundant "correlation" tag. – Prune Sep 04 '20 at 04:13

1 Answers1

0

Vis a vis your P.S., I agree. This probably isn't the right forum for this question, and you might find more thorough/detailed responses if you try a different stack exchange like the one for theoretical computer science: https://cstheory.stackexchange.com/

While we're here though, I might as well throw in my two cents. Personally, when I go about feature selection, I drop features with low correlation with other IVs/features but a high correlation with the DV/target because this is an indicator of (multi)collinearity, which negatively impacts performance. Here's a link to a further explanation: https://towardsdatascience.com/https-towardsdatascience-com-multicollinearity-how-does-it-create-a-problem-72956a49058

Hope you find what you're looking for, and happy tuning!

lemonlin
  • 96
  • 6