Force_plot for multiclass probability explainer

Question

I am facing an error regarding the Python SHAP library. While it is no problem to create force plots based on the log odds, I am not able to create force plots based on probabilities. The goal is to have base_values and shap_values which sum up to the predicted probability.

This works:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xgboost as xgb
import sklearn
import shap

X, y = shap.datasets.iris()
X_display, y_display = shap.datasets.iris(display=True)

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.2, random_state = 42)

#fit xgboost model
params = {
    'objective': "multi:softprob",
    'eval_metric': "mlogloss",
    'num_class': 3
}

xgb_fit = xgb.train(
   params = params
   , dtrain = xgb.DMatrix(data = X_train, label = y_train) 
)

#create shap values and perform tests
explainer = shap.TreeExplainer(xgb_fit)
shap_values = explainer.shap_values(X_train)

And this does not work:

explainer = shap.TreeExplainer(
    model = xgb_fit
    , data = X_train
    , feature_perturbation='interventional'
    , model_output = 'probability'
)

Used packages:

matplotlib 3.4.1

numpy 1.20.2

pandas 1.2.4

scikit-learn 0.24.1

shap 0.39.0

xgboost 1.4.1

look at this link: https://github.com/slundberg/shap/issues/861. I think you need to update the xgboost library — Maryam Bahrami, Apr 22 '21 at 13:16
This is also another link: https://github.com/slundberg/shap/issues/481 — Maryam Bahrami, Apr 22 '21 at 13:50

score 3 · Accepted Answer · answered Apr 23 '21 at 14:40

3

To see how your raw scores for multiclass classification add up in probability space try KernelExplainer:

from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from shap import datasets, KernelExplainer, force_plot, initjs
from scipy.special import softmax, expit

initjs()

X, y = datasets.iris()
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
clf = XGBClassifier(random_state=42, 
                    eval_metric="mlogloss", 
                    use_label_encoder=False)
clf.fit(X_train, y_train)
ke = KernelExplainer(clf.predict_proba, data=X_train)
shap_values = ke.shap_values(X_test)

force_plot(ke.expected_value[1], shap_values[1][0], feature_names=X.columns)

Sanity check:

Expected result (up to a rounding error):

clf.predict_proba(X_test[:1])
#array([[0.0031177 , 0.9867134 , 0.01016894]], dtype=float32)

Base values:

clf.predict_proba(X_train).mean(0)
#array([0.3339472 , 0.34133017, 0.32472247], dtype=float32)

(or if you wish np.unique(y_train, return_counts=True)[1]/len(y_train))

answered Apr 23 '21 at 14:40

Sergey Bushmanov

23,310
7
53
72

Awesome, this works! Do you also know how to make this work using xgb.train()? – user9898927 Apr 26 '21 at 12:10
Ideally this should work the way you titled in the question `And this does not work` (at least I personally would join you in my expectation to work it that way). Unfortunately this doesn't work, and judged by github issues it hasn't been working for the last several months. So my quick answer: I don't know, as far as `force_plot` is concerned. Though, depending on your needs, you may try tinkering with raw scores and convert them to probabilities via softmax func. – Sergey Bushmanov Apr 26 '21 at 12:17
Thanks, then I will build my own workaround based on your example and hope for a future fix. Maybe in the mean time, I will have a deeper look into the raw code (https://github.com/slundberg/shap/blob/master/shap/explainers/_tree.py) to evaluate which part of the code causes the error described above. One additional point, your example raises the following warning: "Use subset (sliced data) of np.ndarray is not recommended because it will generate extra copies and increase memory consumption". For the iris dataset this is no problem, but unfortunately for my data it is. – user9898927 Apr 26 '21 at 12:53

Force_plot for multiclass probability explainer

1 Answers1