I have a train and test data set which contains 30 independent features and 1 target feature.
All the features are numerical variables. An example of the train data set looks like. The test data set also has the same columns
Target | col1 | col2 | ... | col29 | col30 |
---|---|---|---|---|---|
20 | 12 | 14 | ... | 15 | 12 |
25 | 13 | 25 | ... | 19 | 19 |
I want to write an efficient code to run all combination of the features to a light GBM regressor model on test data set to find out the best combination of features which gave the best MAE.
An example of the result output that I am looking for should look like this
Rank | Features_used | MAE |
---|---|---|
1 | col1,col2,col14,col17,col18 | 2.40 |
2 | col4,col5,col15,col19,col24 | 2.50 |
3 | col4,col5,col15,col19,col24,col29,col18,col13 | 2.50 |
-- | ---- | --- |
-- | ---- | --- |
-- | ---- | --- |
-- | ---- | --- |
n | worst combination of features | Worst MAE |
I have tried passing each combination of features individually and finding out the MAE but it seems inefficient while trying out all the combinations.
Predict = 'Target'
train = train[['Target','col1','col2','col3','col4','col5']]
test = test[['Target','col1','col2','col3','col4','col5']]
X_train = train[train.columns.difference([Predict])]
X_test = test[test.columns.difference([Predict])]
y_train = train[Predict]
y_test = test[Predict]
regressor = lightgbm.LGBMRegressor()
regressor= regressor.fit(X_train, y_train,eval_metric = ["MAE"])
y_pred = regressor.predict(X_test)
Is there an efficient way to run all the combination of features and rank the output based on the MAE?