I'm currently using xgb.train(...)
which returns a booster but I'd like to use RFE to select the best 100 features. The returned booster cannot be used in RFE as it's not a sklearn estimator. XGBClassifier is the sklearn api into the xgboost library, however, I am not able to get the same results as with the xgb.train(...)
method (10% worse on roc-auc). I've tried the sklearn boosters but they're not able to get similar results either. I've also tried to wrap the xgb.train(...)
method in a class to add sklearn estimator methods but there's just too many to change. Is there some way to use the xgb.train(...)
along with RFE from sklearn?
Asked
Active
Viewed 2,271 times
8

Marco Cerliani
- 21,233
- 3
- 49
- 54

pmdaly
- 1,142
- 2
- 21
- 35
-
2XGBoost has an sklearn wrapper already. Does that work for you? https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn – hume May 17 '21 at 17:42
1 Answers
3
For this kind of problem, I created shap-hypetune: a python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models
In your case, this enables you to perform RFE
with XGBClassifier
in a very simple and intuitive way:
from shaphypetune import BoostRFE
model = BoostRFE(XGBClassifier(), min_features_to_select=1, step=1)
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], early_stopping_rounds=6, verbose=0)
pred = model.predict(X_test)
As you can see, you can use all the fitting options available in the standard XGB API, like early_stopping_rounds
or custom metrics, to customize the training process.
You can use shap-hypetune also to compute parameter tuning (also simultaneously with feature selection) or to compute feature selection with RFE
or Boruta
using SHAP feature importance. Full example available here

Marco Cerliani
- 21,233
- 3
- 49
- 54