I have the following data table dt.train
, number of days
and the function varImportance
, to get the variable importance of a Linear Model:
library(data.table)
library(caret)
library(xgboost)
library(zoo)
days <- 50
set.seed(123)
dt.train <- data.table(date = seq(as.Date('2020-01-01'), by = '1 day', length.out = 366),
'DE' = rnorm(366, 30, 1), 'windDE' = rnorm(366, 10, 1),
'consumptionDE' = rnorm(366, 35, 1), 'nuclearDE' = rnorm(366, 8, 1),
'solarDE' = rnorm(366, 1, 1), check.names = FALSE)
## Variable Importance Function: ##
## LINEAR MODEL: ##
varImportance <- function(data){
## Model fitting: ##
xgbModel <- stats::lm(DE ~ .-1, data = data.table(data))
varimp <- caret::varImp(xgbModel)
importance <- t(varimp)
}
## Iterative Variable Importance for Linear Model: ##
dt.importance <- as.data.frame(zoo::rollapply(dt.train[, !"date"],
FUN = varImportance,
width = days,
by.column = FALSE,
align = 'left')
)
## Adding date-column again: ##
dt.importance <- cbind(dt.train[1:nrow(dt.importance), .(date)], dt.importance)
Everything works fine here, but what I need is to do the same for a Gradient Boosting Machine Learning Model. I have already tried to do it in the same way, where the preparation for the model fitting is wihtin the varImportance
function:
## Variable Importance function: ##
## GRADIENT BOOSTING: ##
varImportance <- function(data){
## Create response vector and predictor matrix: ##
v.trainY <- data$DE
m.trainData <- as.matrix(data[, c("date", "DE") := list(NULL, NULL)])
## Hyper parameter tuning and grid search: ##
xgb_trcontrol <- caret::trainControl(method = "cv",
number = 3,
allowParallel = TRUE,
verboseIter = TRUE,
returnData = FALSE
)
xgbgrid <- base::expand.grid(nrounds = c(150), # 15000
max_depth = c(2),
eta = c(0.01),
gamma = c(1),
colsample_bytree = c(1),
min_child_weight = c(2),
subsample = c(0.6)
)
## Model fitting: ##
xgbModel <- caret::train(m.trainData,
v.trainY,
trControl = xgb_trcontrol,
tuneGrid = xgbgrid,
method = "xgbTree"
)
varimp <- caret::varImp(xgbModel, scale = FALSE)
importance <- t(varimp$importance)
}
## Iterative Variable Importance for Gradient Boosting: ##
dt.importance <- as.data.frame(zoo::rollapply(dt.train,
FUN = varImportance,
width = days,
by.column = FALSE,
align = 'left')
)
## Adding date-column again: ##
dt.importance <- cbind(dt.train[1:nrow(dt.importance), .(date)], dt.importance)
Unfortunately, this doesn`t work iteratively for each 50 days (thrown error: $ operator is invalid for atomic vectors
). The varImp()
within the varImportance
function works for the gradient boosting model, when it was run once.
EDIT 1:
You answer throws the following error with Gradient Boosting:
EDIT 2:
When I comment trControl = xgb_trcontrol
, then I get the following error: