2

I am working on binary classification and trying to explain my model using SHAP framework.

I am using logistic regression algorithm. I would like to explain this model using both KernelExplainer and LinearExplainer.

So, I tried the below code from SO here

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from shap import TreeExplainer, Explanation
from shap.plots import waterfall

X, y = load_breast_cancer(return_X_y=True, as_frame=True)

idx = 9
model = LogisticRegression().fit(X, y)
background = shap.maskers.Independent(X, max_samples=100)
explainer = KernelExplainer(model,background)
sv = explainer(X.iloc[[5]])   # pass the row of interest as df
exp = Explanation(
    sv.values[:, :, 1],         # class to explain
    sv.base_values[:, 1],
    data=X.iloc[[idx]].values,  # pass the row of interest as df
    feature_names=X.columns,
)
waterfall(exp[0])  

         

This threw an error as shown below

AssertionError: Unknown type passed as data object: <class 'shap.maskers._tabular.Independent'>

How can I explain logistic regression model using SHAP KernelExplainer and SHAP LinearExplainer?

Sergey Bushmanov
  • 23,310
  • 7
  • 53
  • 72
The Great
  • 7,215
  • 7
  • 40
  • 128

1 Answers1

4

Calculation-wise the following will do:

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer

from shap import LinearExplainer, KernelExplainer, Explanation
from shap.plots import waterfall
from shap.maskers import Independent

X, y = load_breast_cancer(return_X_y=True, as_frame=True)

idx = 9
model = LogisticRegression().fit(X, y)

explainer = KernelExplainer(model.predict, X)
sv = explainer.shap_values(X.loc[[5]])   # pass the row of interest as df

exp = Explanation(sv,explainer.expected_value, data=X.loc[[idx]].values, feature_names=X.columns)
waterfall(exp[0])

enter image description here

Note: KernelExplainer doesn't support maskers, and in this case either loc or iloc will return the same.

background = Independent(X, max_samples=100)
explainer = LinearExplainer(model,background)
sv = explainer(X.loc[[5]])   # pass the row of interest by index
waterfall(sv[0])

enter image description here

Note here, LinearExplainer's result can be provided to waterfall "as-is"

Sergey Bushmanov
  • 23,310
  • 7
  • 53
  • 72
  • under kernal explainer, what does `X.loc[[5]]` in used for? Is it used as a background point? And I see that you also use `X.loc[[idx]].values` as data. What is the difference between two? – The Great May 31 '22 at 02:22
  • btw, for linear explainer, why is the x-axis SHAP plot different. Since, we are focussing on binary classification, shouldn't it be as usual 0 to 1 (probability). Is it possible to change the scale of linear explainer output (to explain logistic regression which is binary classification) – The Great May 31 '22 at 02:24
  • Unfortunately, I get the same error again wrt to waterfall plot for breast cancer data - `AttributeError: 'Index' object has no attribute 'items'` when I execute the line `Explanation(sv,explainer.expected_value, data=X.loc[[idx]].values, feature_names=X.columns)`. I guess the issue could be with my system or packages etc. – The Great May 31 '22 at 02:26
  • If kernel explainer doesn't have background, masker, then it will run for long time. Isn't it? If I have to find shap values for all my records in the datatframe etc? – The Great May 31 '22 at 02:51
  • For KernelExplainer, background dataset is `X` total (you may see a warning message if it's larger than 100). The explanations are only calculated for `X.loc[[idx]]`, i.e. the datapoint of interest by index. For LinearExplainer, the background is calculated by masker. The difference is due to this fact I believe. – Sergey Bushmanov May 31 '22 at 02:52
  • Okay, yeah but if we want to find shap value explanations for all our data points in dataframe, it may run for long time. Isn't it?, my point of interest is all points/rows in dataframe – The Great May 31 '22 at 02:58
  • For all point -- yes. I simply following your long standing logic: find SV for a datapoint by index. – Sergey Bushmanov May 31 '22 at 03:05
  • Appreciate all your support and for your posts on SHAP in SO. I am learning and it is all useful for my graduate project. So, linear explainer cannot give us output in the scale of 0 to 1? – The Great May 31 '22 at 03:24
  • KernelExplainer: explaines function `model.predict` (i.e. proba for LogisticRegression). `LinearExplainer`: seem to predict and explain raw. Gimme some time, I'll look how to reconcile the two over weekend. – Sergey Bushmanov May 31 '22 at 03:29
  • I benefit through your answers on SHAP in this forum. However, am reaching out to check wit you on a quick question. Do you know whether SHAP can be used for less frequently used regressors like Huber Regressor, LassoCV regressor etc. Are SHAP the only way to interpret them? I can create a new post and share my example if you have an idea of how this can be done? – The Great Nov 10 '22 at 14:14
  • I am not sure if there exist readily available fast SHAP solutions, but with KernelExplainer you can explain any function, including any ML model. Though KernelExplainer may take some time. – Sergey Bushmanov Nov 12 '22 at 10:11