2

In time series forecasting external regressors can make a big difference. Currently I want to track the effects of external regressors, using the modeltime framework.

However, I could not find any helpful information on this topic so far. I only found out, that you can add regressor variables with a "+" to your recipe.

After adding the variables Transactions (number of transactions per day and Store) and Open_Closed (1 = Store is closed, and 0 = Store is open) to my recipe, I found out, that there was no effect on the prediction. How can I achieve this?

some reprex data:

suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(timetk))


#### DATA

data <- data.frame (Store = c(rep("1",365),rep("2",365)),
                    Sales = c(seq( 1, 44, length.out = 365)),
                    Date = c(dates <- ymd("2013-01-01")+ days(0:364)),
                    Transactions = c(seq( 50, 100, length.out = 365)),
                    Open_Closed = sample(rep(0:1,each=365))
)

h = 42

# split
set.seed(234)
splits <- time_series_split(data, assess = "42 days", cumulative = TRUE)

# recipe
recipe_spec <- recipe(Sales ~ Date + Transactions + Open_Closed, data) %>%
  step_timeseries_signature(Date) %>%
  step_rm(matches("(iso$)|(xts$)|(day)|(hour)|(min)|(sec)|(am.pm)")) %>% 
step_dummy(all_nominal())    
recipe_spec %>% prep() %>% juice()


#### MODELS

# elnet
model_spec_glmnet <- linear_reg(penalty = 1) %>%
  set_engine("glmnet")
wflw_fit_glmnet <- workflow() %>%
  add_model(model_spec_glmnet) %>%
  add_recipe(recipe_spec %>% step_rm(Date)) %>%
  fit(training(splits))

# xgboost
model_spec_xgboost <- boost_tree("regression", learn_rate = 0.35) %>%
  set_engine("xgboost")
set.seed(123)
wflw_fit_xgboost <- workflow() %>%
  add_model(model_spec_xgboost) %>%
  add_recipe(recipe_spec %>% step_rm(Date)) %>%
  fit(training(splits))

# sub tbl
submodels_tbl <- modeltime_table(
  wflw_fit_glmnet,
  wflw_fit_xgboost
)

submodels_tbl %>% 
  modeltime_accuracy(testing(splits)) %>%
  table_modeltime_accuracy(.interactive = FALSE)
Leonhard Geisler
  • 506
  • 3
  • 15
  • I modified the example, to make it actually work. Indeed, the accuracy is changing, when you add Transactions and Open_Closed to the recipe. I just did not see it in the GLMNET Model. This is because GLMNET has some pretty strict internal feature selection going on, like indicated in this book: https://bookdown.org/max/FES/selection.html. Adding the XGBOOST Model makes the changes visible. In this case the prediction is getting worse because of the irrelevant features – Leonhard Geisler Feb 15 '22 at 13:20

0 Answers0