13

I am probably looking right over it in the documentation, but I wanted to know if there is a way with XGBoost to generate both the prediction and probability for the results? In my case, I am trying to predict a multi-class classifier. it would be great if I could return Medium - 88%.

  • Classifier = Medium
  • Probability of Prediction = 88%

parameters

params = {
    'max_depth': 3,
    'objective': 'multi:softmax',  # error evaluation for multiclass training
    'num_class': 3,
    'n_gpus': 0
}

prediction

pred = model.predict(D_test)

results

array([2., 2., 1., ..., 1., 2., 2.], dtype=float32)

User friendly (label encoder)

pred_int = pred.astype(int)
label_encoder.inverse_transform(pred_int[:5])
array(['Medium', 'Medium', 'Low', 'Low', 'Medium'], dtype=object)

EDIT: @Reveille suggested predict_proba. I am not instantiating XGBClassifer(). Should I be? How would I modify my pipeline to use that, if so?

params = {
    'max_depth': 3,
    'objective': 'multi:softmax',  # error evaluation for multiclass training
    'num_class': 3,
    'n_gpus': 0
}

steps = 20  # The number of training iterations

model = xgb.train(params, D_train, steps)
scarpacci
  • 8,957
  • 16
  • 79
  • 144

1 Answers1

18

You can try pred_p = model.predict_proba(D_test)

An example I had around (not multi-class though):

import xgboost as xgb
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split

X, y = make_moons(noise=0.3, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)

xgb_clf = xgb.XGBClassifier()
xgb_clf = xgb_clf.fit(X_train, y_train)

print(xgb_clf.predict(X_test))
print(xgb_clf.predict_proba(X_test))


[1 1 1 0 1 0 1 0 0 1]
[[0.0394336  0.9605664 ]
 [0.03201818 0.9679818 ]
 [0.1275925  0.8724075 ]
 [0.94218    0.05782   ]
 [0.01464975 0.98535025]
 [0.966953   0.03304701]
 [0.01640552 0.9835945 ]
 [0.9297296  0.07027044]
 [0.9580196  0.0419804 ]
 [0.02849442 0.9715056 ]]

Note as mentioned in the comments by @scarpacci (ref):

predict_proba() method only exists for the scikit-learn interface

Reveille
  • 4,359
  • 3
  • 23
  • 46
  • I am not instantiating XGBClassifier to use that (maybe I should be?). params = { 'max_depth': 3, 'objective': 'multi:softmax', # error evaluation for multiclass training 'num_class': 3, 'n_gpus': 0 } steps = 20 # The number of training iterations model = xgb.train(params, D_train, steps) – scarpacci Apr 07 '20 at 15:49
  • If the `predict` method runs fine (which it seems it does from your code), so should the `predict_proba` method. Only reason I can think of is that your case is multi-class. [This](https://stackoverflow.com/questions/57986259/multiclass-classification-with-xgboost-classifier) may help. – Reveille Apr 07 '20 at 16:03
  • I see. You don't want to use the scikit-learn interface for some reason? – Reveille Apr 07 '20 at 16:14
  • 1
    I think I will have to modify it to use it. Not opposed, just had my pipeline built differently. Thanks for the help! – scarpacci Apr 07 '20 at 16:18
  • Perfect! Thank you. Greatly appreciate it. – scarpacci Apr 07 '20 at 16:21
  • My pleasure! I also just added the link you shared above to the answer for better visibility. – Reveille Apr 07 '20 at 16:31