3

I'm working on a binary classification problem, my training data has millions of records and ~2000 variables. I'm running lightGBM for feature selection and using the features selected from lightGBM to run Neural network (using Keras) model for predictions. I have couple of questions on the approach I'm following.

  1. I'm doing hyper-parameter tuning when using lightGBM for feature selection. This is based on my understanding that with the change in hyper-parameters, features selected will also be different. I'm using 'goss' algorithm and 'gain' as feature importance type. I have seen couple of articles where they are using lightGBM to do feature selection but I haven't seen any where doing hyper-parameter tuning they are just using default settings. Is this correct approach?
  2. Is it ok to use lightGBM for feature selection and Neural network to build the model for predictions based on the features selected from lightGBM?

Any help is much appreciated. Thanks

Haritha
  • 641
  • 2
  • 7
  • 12
  • 1
    To help with (2), I'm not sure whether it's the right way to go combining the outcome of LightGBM's feature selection into your NN. GBMs have their own way of picking splits and putting together the final model (which keep in mind is a combination of weak learners). This feature selection might work for boosting but might not achieve what you're aiming for by feeding those features into NN. Neural Networks pick on signals differently than trees, hence not sure it would serve your purpose. – babygrogu Jul 16 '20 at 14:41
  • Thanks a lot for replying. In that case, should I do feature selection based on filter methods which are independent of classification algorithms? I want to understand which feature selection methods are generally used before running NN. – Haritha Jul 20 '20 at 04:50
  • This reminds me of the boruta package. It uses random forests to check whether a feature contains more usefull info as a column filled with random numbers. Is this maybe, what you want to do? – jottbe Oct 03 '20 at 22:49

1 Answers1

4

Gradient Boosting algorithm are valid approaches to identify features but not the most efficient way because these methods are heuristics and very costly - in other words the running time is much higher compared to the other methods.

Regarding the hyper-parameter tuning for feature-selection: Often times, the hyper-parameter does end up with the same feature set but of course different values. e.g. imagine model1 is A > 3.4 and B < 2.7 where A and B are features and model 2 A > 3.2 and B < 2.5. They are different models and one may have a much better performance but at the end they use the same features ! your goal is not using the best model, because you intend to build a deep learning model on top. However, in your case with 2000 feature, it may be a bit different depending on the depth of branches and number of features to build trees.

In general, it is not common to do hyper-parameter tuning at the feature selection phase, rather on model building. Particularly with deep learning models, one aims to be as inclusive as possible.

Areza
  • 5,623
  • 7
  • 48
  • 79