1

I have the following data table dt.train, number of days and the function varImportance, to get the variable importance of a Linear Model:

library(data.table)
library(caret)
library(xgboost)
library(zoo)

days <- 50
set.seed(123)
dt.train <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 366),
                       'DE' = rnorm(366, 30, 1), 'windDE' = rnorm(366, 10, 1),
                       'consumptionDE' = rnorm(366, 35, 1), 'nuclearDE' = rnorm(366, 8, 1), 
                       'solarDE' = rnorm(366, 1, 1), check.names = FALSE)

## Variable Importance Function: ##
## LINEAR MODEL: ##
varImportance <- function(data){
    ## Model fitting: ##
    xgbModel <- stats::lm(DE ~ .-1, data = data.table(data))
    varimp <- caret::varImp(xgbModel)
    importance <- t(varimp)
}

## Iterative Variable Importance for Linear Model: ##
dt.importance <- as.data.frame(zoo::rollapply(dt.train[, !"date"], 
                                              FUN = varImportance,
                                              width = days,
                                              by.column = FALSE,
                                              align = 'left')
)

## Adding date-column again: ##
dt.importance <- cbind(dt.train[1:nrow(dt.importance), .(date)], dt.importance)

Everything works fine here, but what I need is to do the same for a Gradient Boosting Machine Learning Model. I have already tried to do it in the same way, where the preparation for the model fitting is wihtin the varImportance function:

  ## Variable Importance function: ##
  ## GRADIENT BOOSTING: ##
  varImportance <- function(data){
    
    ## Create response vector and predictor matrix: ##
    v.trainY <- data$DE
    m.trainData <- as.matrix(data[, c("date", "DE") := list(NULL, NULL)])

    ## Hyper parameter tuning and grid search: ##
    xgb_trcontrol <- caret::trainControl(method = "cv",
                                         number = 3,
                                         allowParallel = TRUE,
                                         verboseIter = TRUE,
                                         returnData = FALSE
    ) 
    
    xgbgrid <- base::expand.grid(nrounds = c(150), # 15000
                                 max_depth = c(2),
                                 eta = c(0.01),
                                 gamma = c(1),
                                 colsample_bytree = c(1),
                                 min_child_weight = c(2),
                                 subsample = c(0.6)
    )

    ## Model fitting: ##
    xgbModel <- caret::train(m.trainData, 
                             v.trainY,
                             trControl = xgb_trcontrol,
                             tuneGrid = xgbgrid,
                             method = "xgbTree"
    )
    
    varimp <- caret::varImp(xgbModel, scale = FALSE)
    importance <- t(varimp$importance)
    
  }
## Iterative Variable Importance for Gradient Boosting: ##
dt.importance <- as.data.frame(zoo::rollapply(dt.train, 
                                              FUN = varImportance,
                                              width = days,
                                              by.column = FALSE,
                                              align = 'left')
)

## Adding date-column again: ##
dt.importance <- cbind(dt.train[1:nrow(dt.importance), .(date)], dt.importance)

Unfortunately, this doesn`t work iteratively for each 50 days (thrown error: $ operator is invalid for atomic vectors). The varImp() within the varImportance function works for the gradient boosting model, when it was run once.

EDIT 1:

You answer throws the following error with Gradient Boosting:

enter image description here

EDIT 2:

When I comment trControl = xgb_trcontrol, then I get the following error:

enter image description here

MikiK
  • 398
  • 6
  • 19
  • if I understand correctly, the first calculation should be `varImportance(dt.train[1:days])`. However this returns an error. Am I missing something? – Waldi Mar 22 '21 at 20:23
  • Exactly the first calculation should be from ```1: days```, the second from ```2: days```, the third from ```3: days```, and so on. This already works fine for the linear model, you could try it out, then it should be clear how it is meant. I want to do the same for the gradient boosting, and I already tried it out (as you can see above), but with my version it throws an error. Maybe someone knows another version so that this works or how to fix this problem? – MikiK Mar 23 '21 at 05:19
  • Thanks for your feedback. "doesn"t work iteratively" might be confusing as the first iteration doesn't work at all. First step is to make xgbMpdel work with varImp before trying to iterate. – Waldi Mar 23 '21 at 05:42
  • If I fit the xgbModel for the whole data set, then I compute the variable importance for the fitted model with ```varimp <- caret::varImp(xgbModel, scale = FALSE)``` and this works fine. So, ```varImp``` works fine, when it isn't used iteratively with the function ```varImportance```. – MikiK Mar 23 '21 at 06:13
  • On my system, `varImportance(dt.train)` returns `Non-tree model detected! This function can only be used with tree models.` – Waldi Mar 23 '21 at 06:21
  • No don't use ```varImportance``` function, so just without constructing the ```varImportance``` function around it. Fit the model xgbModel and use the following: ```varimp <- caret::varImp(xgbModel, scale = FALSE)``` This gives the variable importance for the the fitted model. I have to make the GB work as well as the LM. How doesn’t matter. It doesn't have to be the same varImportance function. – MikiK Mar 23 '21 at 07:24
  • I don't use the R-function ```varImportance```, I use the ```varImp``` function. varImportance here in my question is a function I have tried to construct on my own. – MikiK Mar 23 '21 at 07:27
  • `varImp`is the cause of above error in the `varImportance` function you created. – Waldi Mar 23 '21 at 08:01
  • Ok, so what does this mean? I'm not using a tree model? – MikiK Mar 23 '21 at 08:16
  • perhaps too much colinearity for a boosted tree, see https://stackoverflow.com/questions/42670033/r-getting-non-tree-model-detected-this-function-can-only-be-used-with-tree-mo – Waldi Mar 23 '21 at 08:19
  • I think colinearity is not the problem. – MikiK Mar 23 '21 at 09:15

1 Answers1

1

You need to convert input back to data.table as you are using some data.table functions. rollapply will send input as a matrix. You should note that your first column is date and when rollapply converts a subset of data into a matrix everything will be converted to character class.

Since you don't use date in your function, it's better to drop this column before sending data in rollapply function. However, If you want to send complete data, then you will need to convert everything back to numeric from character. In the below code, I am just dropping date column in input.

here is the working code -

library(data.table)
library(caret)
library(xgboost)
library(zoo)

days <- 50
set.seed(123)
dt.train <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 366),
                       'DE' = rnorm(366, 30, 1), 'windDE' = rnorm(366, 10, 1),
                       'consumptionDE' = rnorm(366, 35, 1), 'nuclearDE' = rnorm(366, 8, 1), 
                       'solarDE' = rnorm(366, 1, 1), check.names = FALSE)

## GRADIENT BOOSTING: ##
varImportance <- function(data){
  
  data = data.table(data.frame(data))
  ## Create response vector and predictor matrix: ##
  v.trainY <- data$DE
  m.trainData <- as.matrix(data[, c("DE") := list(NULL)])
  
  ## Hyper parameter tuning and grid search: ##
  xgb_trcontrol <- caret::trainControl(method = "cv",
                                       number = 3,
                                       allowParallel = TRUE,
                                       verboseIter = TRUE,
                                       returnData = FALSE
  ) 
  
  xgbgrid <- base::expand.grid(nrounds = c(150), # 15000
                               max_depth = c(2),
                               eta = c(0.3),
                               gamma = c(1),
                               colsample_bytree = c(1),
                               min_child_weight = c(2),
                               subsample = c(0.6)
  )
  
  ## Model fitting: ##
  xgbModel <- caret::train(m.trainData, 
                           v.trainY,
                           trControl = xgb_trcontrol,
                            tuneGrid = xgbgrid,
                           method = "xgbTree" )
  
  varimp <- caret::varImp(xgbModel, scale = FALSE)
  importance <- t(varimp$importance)
  
}
## Iterative Variable Importance for Gradient Boosting: ##
dt.importance1 <- as.data.frame(zoo::rollapply(dt.train[,-1], 
                                              FUN = varImportance,
                                              width = days,
                                              by.column = FALSE,
                                              align = 'left')
)

## Adding date-column again: ##
dt.importance <- cbind(dt.train[1:nrow(dt.importance1), .(date)], dt.importance1)
aashish
  • 315
  • 1
  • 7
  • Thanks you for your answer. I'll try it out later. Could this also work for a Multivariate Adaptive Regression Spline (= MARS), which I have mentioned at the following question: [link](https://stackoverflow.com/questions/66243782/how-to-iteratively-train-forecast-models-gam-mars-based-on-selected-days) and has the same conventions for the model fitting as the gradient boosting. – MikiK Mar 29 '21 at 05:31
  • Using you code from your answer, throws an error (see asked question above). Why does it work for you, but not for me? What does the error message mean? – MikiK Mar 29 '21 at 08:56
  • You got "$ operator is invalid for atomic vectors" error because rollapply was not working properly. – aashish Mar 30 '21 at 01:44
  • The second error suggests that you can't use predict on the results of the cross-validation. Comment trControl = xgb_trcontrol in you code and try again. See another post on this topic - https://stackoverflow.com/questions/52007875/error-while-using-predict-in-xg-boost-in-r – aashish Mar 30 '21 at 02:00
  • When I comment ```trControl = xgb_trcontrol```, then this yields the following warning/error (see my second edit in the question). – MikiK Mar 30 '21 at 08:08