0

I am wondering whether Friedman's H-statistic for the importance of two features should be symmetric? If I understand the source attached correctly, then it should be symmetric. However, in my application and minimum working example it is not. Where is my mistake? In the minimum working example below, I think the result for rm:crim and crim:rm should be identical. But they aren't. The statistic I calculate is $H_{jk}^2$. In the text, the author writes about sampling. Does this explain the asymmetric results? Thanks for your help. See also the source below.

library("rpart")
library("iml")

set.seed(42)
# Fit a CART on the Boston housing data set
data("Boston", package = "MASS")
rf <- rpart(medv ~ ., data = Boston)
# Create a model object
mod <- Predictor$new(rf, data = Boston[-which(names(Boston) == "medv")])

# Measure the interaction strength
ia <- Interaction$new(mod, feature = "rm")
ia2 <- Interaction$new(mod, feature = "crim")

View(ia$results)
View(ia2$results) 

https://christophm.github.io/interpretable-ml-book/interaction.html

Daniel2805
  • 77
  • 1
  • 8

0 Answers0