5

I usually get to feature importance using

regr = XGBClassifier()
regr.fit(X, y)
regr.feature_importances_

where type(regr) is .

However, I have a pickled mXGBoost model, which when unpacked returns an object of type . This is the same object as if I would have ran regr.get_booster().

I have found a few solutions for getting variable importance from a booster object, but is there a way to get to the classifier object from the booster object so I can just apply the same feature_importances_ command? This seems like the most straightforward solution, or it seems like I have to write a function that mimics the output of feature_importances_ in order for it to fit my logged feature importances...

So ideally I'd have something like

xbg_booster = pickle.load(open("xgboost-model", "rb"))
assert str(type(xgb_booster)) == "<class 'xgboost.core.Booster'>", 'wrong class'
xgb_classifier = xgb_booster.get_classifier()
xgb_classifier.feature_importances_

Are there any limitations to what can be done with a booster object in terms finding the classifier? I figure there's some combination of save/load/dump that will get me what I need but I'm stuck for now...

Also for context, the pickled model is the output from AWS sagemaker, so I'm just unpacking it to do some further evaluation

L Xandor
  • 1,659
  • 4
  • 24
  • 48

2 Answers2

3

Based on my own experience trying to recreate a classifier from a booster object generated by SageMaker I learned the following:

  1. It doesn't appear to be possible to recreate the classifier from the booster. :(
  2. https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.Booster has the details on the booster class so you can review what it can do.

Crazy things you can do however:

  1. You can create a classifier object and then over-ride the booster within it:

    xgb_classifier = xgb.XGBClassifier(**xgboost_params)

    [..]

    xgb_classifier._Boster = booster

This is nearly useless unless you fit it otherwise it doesn't have any feature data. (I didn't go all the way through this scenario to validate if fitting would provide the feature data required to be functional.)

  1. You can remove the booster object from the classifier and then pickle the classifier using xgboost directly. Then later restore the SageMaker booster back into it. This abomination is closer and appears to work, but is not truly a rehydrated classifier object from the SageMaker output alone.

Recommendation

If you’re not stuck using the SageMaker training solution you can certainly use XGBoost directly to train with. At that point you have access to everything you need to dump/save the data for use in a different context.

I know you're after feature importance so I hope this gets you closer, I had a different use case and was ultimately able to leverage the booster for what I needed.

0

I was able to get xgboost.XGBClassifier model virtually identical to a xgboost.Booster version model by

(1) extracting all tuning parameters from the booster model using this: import json json.loads(your_booster_model.save_config())

(2) implementing these same tuning parameters and then training a XGBClassifier model using the same training dataset used to train the Booster model before that.

Note: one mistake I made was that I forgot to explicitly assign the same seed /random_state in both Booster and Classifier versions.