I have used the caret R package to train a neural network, and a random forest. Can I find the SHAP values for feature importance in any way?
Asked
Active
Viewed 173 times
0
-
Could you please share some reproducible data using `dput`? – Quinten Jun 14 '23 at 12:50
-
See https://stackoverflow.com/questions/71737056/shap-values-from-caret: what are the classes of the models that you get back? – Ben Bolker Jun 14 '23 at 12:52
-
Classification models. – les2004 Jun 15 '23 at 07:13
1 Answers
1
Slightly modified from kernelshap's README: https://github.com/ModelOriented/kernelshap
library(caret)
library(kernelshap)
library(shapviz)
fit <- train(
Sepal.Length ~ . + Species * Sepal.Width,
data = iris,
method = "lm",
tuneGrid = data.frame(intercept = TRUE),
trControl = trainControl(method = "none")
)
xvars <- colnames(iris[-1])
s <- kernelshap(fit, iris, predict, bg_X = iris, feature_names = xvars)
sv <- shapviz(s)
sv_importance(sv)
sv_dependence(sv, xvars)
Remarks
- Replace the linear model by anything else
- If the dataset is larger than 500 obs, replace
bg_X
by a subsample of about 200--500 rows. - Probabilistic classification will work similarly

Michael M
- 880
- 7
- 10
-
For the predict part of s because I have a classification problem I need: predict(model_glm, train, type = 'prob')$Yes. But I get the error;Error in kernelshap.default(model_glm, newtr, pred_fun = predict(model_glm, : is.function(pred_fun) is not TRUE. Can I fix that somehow? – les2004 Jun 15 '23 at 07:48
-
`pred_fun = function(m, x) predict(m, x, type='prob')$Yes` simply test with `pred_fun(data)`. If it is a numeric vector or matrix, you are safe. – Michael M Jun 15 '23 at 08:24
-
Thanks! If the dataset is larger than 500 obs what are the consequences? Because I would like to see the global feature importance. – les2004 Jun 15 '23 at 08:31
-
A slow progress bar... You would usually subsample both X (explanation rows, 1000 rows) and bg_X (background data, 100 - 500 rows). – Michael M Jun 15 '23 at 10:28
-
-
Colors in dependence plots will give you hints about interactions. If you need SHAP interaction decompositions, then you can crunch them with treeshap (github project) for random forests, or easiest via XGBoost. I.e. outside caret. If you provide a specific example, it is easier. – Michael M Jun 15 '23 at 11:54
-
How can I output the interaction plots? Because when I run the code the plot in Rstudio is very clustered. – les2004 Jun 16 '23 at 06:05
-
-
It can work with the example you provided. How can I output the interaction plots in tiff or eps? – les2004 Jun 16 '23 at 07:09
-
-
-
-
I'd suggest looking through "shapviz"'s Readme: `sv_importance(kind="bee)` – Michael M Jun 17 '23 at 13:19
-
Done! Can I change the legend name? Instead of feature value to have variable value? – les2004 Jun 17 '23 at 18:03
-
Feel free to do anything possible with ggplot. E.g. `+ labs(color="Variable value")` – Michael M Jun 17 '23 at 18:37
-
Can I have a specific length in the x-axis? I.e -0. 5 to 0.5. From the documentation, I cannot find sth. – les2004 Jun 21 '23 at 18:38
-
From "ggplot"s documentation, it should be rather clear: `+ coord_cartesian(xlim = c(-0.5, 0.5))`. I can recommend you Hadley's free book. It covers a lot of topics on ggplot: https://r4ds.had.co.nz/ – Michael M Jun 21 '23 at 18:55