2

Below is how I trained an XGBClassifier and saved it:

import pickle
from xgboost import XGBClassifier

# train
model = XGBClassifier()
model.fit(X, y)

# export
pickle.dump(model, open('model.pickle', 'wb'))

This is how I loaded the model and made predictions

loaded_model = pickle.load(open('model.pickle', 'rb'))
y_pred = loaded_model.predict(X)

The model predictions are OK if the model was loaded from within the same python process where the training was performed, but the predictions are not OK (random) if the model was loaded from a different python process than the one used for training.

Note, I've the same problem if model.save_model and model.load_model were used instead of pickle.

The simple checks I did shows the model was saved and loaded properly; the dumps of model._Booster (acquired via model._Booster.dump_model(some_file)) and loaded_model._Booster are identical.

Python version: 3.7.5

xgboost version: tried both 0.80 and 0.90

Any suggestion is appreciated.

mohaseeb
  • 389
  • 4
  • 8
  • How do you actually compare both trained models? – yatu Jan 09 '20 at 16:44
  • @yatu `model._Booster` holds the learnt trees. `model._Booster.dump_model(file)` will dump the model (including the trees). So I made a dump for the original trained model (`model._Booster.dump_model('orig.txt')`) , and a dump for the loaded version (`loaded_model._Booster.dump_model('loaded.txt')`), and compared them using bash `diff orig.txt loaded.txt`. I'm don't know though if there are other model parts not covered by this check. – mohaseeb Jan 09 '20 at 17:19
  • 1
    I am facing a similar issue any fix to this? – yudhiesh Sep 17 '21 at 04:11
  • Great question! Im experiencing a similar issue, any findings on this? – mrcw Feb 15 '22 at 12:59

1 Answers1

0

In my case, i had changed column order while predicting which led to different performance. The column order for training data and prediction data Must be same