R - Using xgboost as feature selection but also interaction selection

Question

Let's say I have a dataset with a lot of variables (more than in the reproductible example below) and I want to build a simple and interpretable model, a GLM.

I can use a xgboost model first, and look at importance of variables (which depends on the frequency and the gain of each variable in the successive decision trees) to select the 10 most influent variables:

Question : is there a way to highlight the most significant 2d-interactions ?

library(dplyr)
library(xgboost)

# data
data(mtcars)
dtrain <- xgb.DMatrix(
  data  = mtcars %>% select(-am) %>% as.matrix(), 
  label = mtcars$am
)

# xgboost parameters
xgb_params <- list(
  objective = "binary:logistic",
  eta       = 0.1,
  max_depth = 2
)

# xgboost fit
xgb_mod <- xgb.train(
  data     = dtrain, 
  params   = xgb_params,   
  nrounds  = 10,
  eval     = "auc",
  maximize = TRUE   
)

# feature importance
xgb.importance(dimnames(dtrain)[[2]], model = xgb_mod)
#    Feature       Gain      Cover  Frequency
# 1:      wt 0.53965838 0.46589322 0.47619048
# 2:    gear 0.41691383 0.37360220 0.28571429
# 3:    qsec 0.03215627 0.11810252 0.19047619
# 4:      hp 0.01127152 0.04240205 0.04761905

Question : is there a way to highlight the most significant interaction according to the xgboost model ?

According to the feature importance, I can built a GLM with 4 variables (wt, gear, qsec, hp) but I would like to know if some 2d-interaction (for instance wt:hp) should have an interest to be added in a simple model.

I am trying to install the package, without success for now. `install_github("Far0n/xgbfi")` didn't work, I download the zip but something failing when I tried to make the installation : `* checking for file 'xgbfi/DESCRIPTION' ... NO` — demarsylvain, May 01 '19 at 20:12
I managed to install RXGfi `devtools::install_github("RSimran/RXGBfi")`, but I got the same error than here : https://stackoverflow.com/questions/49686063/xgb-fi-function-detecting-interactions-and-working-with-xgboost-returns-except — demarsylvain, May 07 '19 at 20:24

R - Using xgboost as feature selection but also interaction selection

0 Answers0