During feature selection (after doing extensive feature engineering), is there any set of rules that govern which features to drop and which to keep ? I know that highly correlated features should be dropped or merged into newer features, however I have also seen people drop columns that doesn't have high correlation with other features but have significant higher correlation (> 0.8) to the target variable. Why is that so ? Any other tips and help is appreciated.
P.S. : I know this question is pretty broad and isn't exactly on a very specific topic.