1

I want to fit a linear regression model using the tsibble package and I have a bunch of dummy variables that I want to include in my analysis. A sample dataset would be the following:

library(tsibble)
library(dplyr)
library(fable)

ex = structure(list(id = c("KEY1", "KEY1", "KEY1", "KEY1", "KEY1", 
"KEY1", "KEY1", "KEY1", "KEY1", "KEY1", "KEY1", "KEY1", "KEY1", 
"KEY1", "KEY1"), sales = c(0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0), date = structure(c(15003, 15004, 15005, 15006, 15007, 
15008, 15009, 15010, 15011, 15012, 15013, 15014, 15015, 15016, 
15017), class = "Date"), wday = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L), dummy_1 = c(0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), dummy_2 = c(0, 0, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 0, 0, 0, 0), dummy_3 = c(0, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -15L), key = structure(list(
    id = "KEY1", .rows = list(1:15)), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), index = structure("date", ordered = TRUE), index2 = "date", interval = structure(list(
    year = 0, quarter = 0, month = 0, week = 0, day = 1, hour = 0, 
    minute = 0, second = 0, millisecond = 0, microsecond = 0, 
    nanosecond = 0, unit = 0), class = "interval"), class = c("tbl_ts", 
"tbl_df", "tbl", "data.frame"))

> ex
# A tsibble: 15 x 7 [1D]
# Key:       id [1]
   id    sales date        wday dummy_1 dummy_2 dummy_3
   <chr> <dbl> <date>     <int>   <dbl>   <dbl>   <dbl>
 1 KEY1      0 2011-01-29     1       0       0       0
 2 KEY1      5 2011-01-30     2       0       0       0
 3 KEY1      0 2011-01-31     3       0       0       1
 4 KEY1      0 2011-02-01     4       1       0       0
 5 KEY1      0 2011-02-02     5       0       0       0
 6 KEY1      0 2011-02-03     6       0       0       0
 7 KEY1      0 2011-02-04     7       0       1       0
 8 KEY1      0 2011-02-05     1       0       0       0
 9 KEY1      0 2011-02-06     2       0       0       0
10 KEY1      0 2011-02-07     3       0       0       0
11 KEY1      0 2011-02-08     4       0       0       0
12 KEY1      0 2011-02-09     5       0       0       0
13 KEY1      0 2011-02-10     6       0       0       0
14 KEY1      0 2011-02-11     7       0       0       0
15 KEY1      0 2011-02-12     1       0       0       0 

They are too many dummies to specify manually so I was hoping for something faster. Normally I would use the . symbol in the formula in the following way:

fit = ex %>% 
  model(TSLM(sales ~ trend() + season() + .))

But this does not work:

Warning message:
1 error encountered for TSLM(sales ~ trend() + season() + .)
[1] '.' in formula and no 'data' argument

Is there a systematic tsibble way around this or do I have to create the formula on the fly using the names of the dataset?

User2321
  • 2,952
  • 23
  • 46
  • What is `model`? – akrun May 03 '20 at 19:55
  • I meant when I use the code, I get `Error in model(., TSLM(sales ~ trend() + season() + .)) : could not find function "model"` – akrun May 03 '20 at 20:00
  • I tried their example in the documentation. it is giving me some errors. may be package version? `as_tsibble(USAccDeaths) %>% + model(lm = TSLM(log(value) ~ trend() + season()))` – akrun May 03 '20 at 20:09
  • error is `Error: Can't cast to ` – akrun May 03 '20 at 20:10
  • Its definitely one of your packages for me the code works. I had the same issues at the beginning. For me it was the `vctrs` package but I don't know if it's the same for you... – User2321 May 03 '20 at 20:12
  • 1
    may be I need to update those versions. thanks – akrun May 03 '20 at 20:13
  • 1
    Can you try `nm1 <- names(ex)[startsWith(names(ex), 'dummy')];ex %>% model(lm = TSLM(reformulate(c(nm1, 'trend()', 'season()'), 'sales') ))` – akrun May 03 '20 at 20:18
  • The issue with using `.` is that `.` is the whole dataset coming from the `%>%`, so, it could result in an error – akrun May 03 '20 at 20:26

1 Answers1

1

We could create a formula with reformulate using the 'dummy' column names

nm1 <- names(ex)[startsWith(names(ex), 'dummy')]
ex %>%
    model(lm = TSLM(reformulate(c(nm1, 'trend()', 'season()'), 'sales') ))
akrun
  • 874,273
  • 37
  • 540
  • 662