0

I have used the caret R package to train a neural network, and a random forest. Can I find the SHAP values for feature importance in any way?

les2004
  • 105
  • 7

1 Answers1

1

Slightly modified from kernelshap's README: https://github.com/ModelOriented/kernelshap

library(caret)
library(kernelshap)
library(shapviz)

fit <- train(
  Sepal.Length ~ . + Species * Sepal.Width, 
  data = iris, 
  method = "lm", 
  tuneGrid = data.frame(intercept = TRUE),
  trControl = trainControl(method = "none")
)

xvars <- colnames(iris[-1])
s <- kernelshap(fit, iris, predict, bg_X = iris, feature_names = xvars)
sv <- shapviz(s)
sv_importance(sv)
sv_dependence(sv, xvars)

enter image description here enter image description here

Remarks

  1. Replace the linear model by anything else
  2. If the dataset is larger than 500 obs, replace bg_X by a subsample of about 200--500 rows.
  3. Probabilistic classification will work similarly
Michael M
  • 880
  • 7
  • 10
  • For the predict part of s because I have a classification problem I need: predict(model_glm, train, type = 'prob')$Yes. But I get the error;Error in kernelshap.default(model_glm, newtr, pred_fun = predict(model_glm, : is.function(pred_fun) is not TRUE. Can I fix that somehow? – les2004 Jun 15 '23 at 07:48
  • `pred_fun = function(m, x) predict(m, x, type='prob')$Yes` simply test with `pred_fun(data)`. If it is a numeric vector or matrix, you are safe. – Michael M Jun 15 '23 at 08:24
  • Thanks! If the dataset is larger than 500 obs what are the consequences? Because I would like to see the global feature importance. – les2004 Jun 15 '23 at 08:31
  • A slow progress bar... You would usually subsample both X (explanation rows, 1000 rows) and bg_X (background data, 100 - 500 rows). – Michael M Jun 15 '23 at 10:28
  • Last question: Can I visualize the interactions of variables? – les2004 Jun 15 '23 at 11:01
  • Colors in dependence plots will give you hints about interactions. If you need SHAP interaction decompositions, then you can crunch them with treeshap (github project) for random forests, or easiest via XGBoost. I.e. outside caret. If you provide a specific example, it is easier. – Michael M Jun 15 '23 at 11:54
  • How can I output the interaction plots? Because when I run the code the plot in Rstudio is very clustered. – les2004 Jun 16 '23 at 06:05
  • I can only help if there is a self-contained example in the post. – Michael M Jun 16 '23 at 06:11
  • It can work with the example you provided. How can I output the interaction plots in tiff or eps? – les2004 Jun 16 '23 at 07:09
  • How can one cite the kernelshap package? Is there a doi? – les2004 Jun 16 '23 at 14:01
  • citation("kernelshap") in R gives the bibtex :-) – Michael M Jun 16 '23 at 14:25
  • Can I create a beesworm or a violin plot with the same package? – les2004 Jun 17 '23 at 11:58
  • I'd suggest looking through "shapviz"'s Readme: `sv_importance(kind="bee)` – Michael M Jun 17 '23 at 13:19
  • Done! Can I change the legend name? Instead of feature value to have variable value? – les2004 Jun 17 '23 at 18:03
  • Feel free to do anything possible with ggplot. E.g. `+ labs(color="Variable value")` – Michael M Jun 17 '23 at 18:37
  • Can I have a specific length in the x-axis? I.e -0. 5 to 0.5. From the documentation, I cannot find sth. – les2004 Jun 21 '23 at 18:38
  • From "ggplot"s documentation, it should be rather clear: `+ coord_cartesian(xlim = c(-0.5, 0.5))`. I can recommend you Hadley's free book. It covers a lot of topics on ggplot: https://r4ds.had.co.nz/ – Michael M Jun 21 '23 at 18:55