XGBClassifier: Bad predictions after training, saving and loading a model

Question

Below is how I trained an XGBClassifier and saved it:

import pickle
from xgboost import XGBClassifier

# train
model = XGBClassifier()
model.fit(X, y)

# export
pickle.dump(model, open('model.pickle', 'wb'))

This is how I loaded the model and made predictions

loaded_model = pickle.load(open('model.pickle', 'rb'))
y_pred = loaded_model.predict(X)

The model predictions are OK if the model was loaded from within the same python process where the training was performed, but the predictions are not OK (random) if the model was loaded from a different python process than the one used for training.

Note, I've the same problem if model.save_model and model.load_model were used instead of pickle.

The simple checks I did shows the model was saved and loaded properly; the dumps of model._Booster (acquired via model._Booster.dump_model(some_file)) and loaded_model._Booster are identical.

Python version: 3.7.5

xgboost version: tried both 0.80 and 0.90

Any suggestion is appreciated.

@yatu `model._Booster` holds the learnt trees. `model._Booster.dump_model(file)` will dump the model (including the trees). So I made a dump for the original trained model (`model._Booster.dump_model('orig.txt')`) , and a dump for the loaded version (`loaded_model._Booster.dump_model('loaded.txt')`), and compared them using bash `diff orig.txt loaded.txt`. I'm don't know though if there are other model parts not covered by this check. — mohaseeb, Jan 09 '20 at 17:19
Great question! Im experiencing a similar issue, any findings on this? — mrcw, Feb 15 '22 at 12:59

score 0 · Answer 1 · answered Aug 19 '22 at 19:56

0

In my case, i had changed column order while predicting which led to different performance. The column order for training data and prediction data Must be same

answered Aug 19 '22 at 19:56

Nishant Kumar

41
3

XGBClassifier: Bad predictions after training, saving and loading a model

1 Answers1