0

Let's say I'm trying to predict the variable y four months into the future using a dynamic ARIMA regression. I know in advance the xreg variables for the four months. I'm not entirely sure how the forecast function makes the forecasts, e.g. can I feed it with missing y values and it will automatically assume that I'm trying to forecast the four months right after the training period if feed it just the xreg for those months?

Does the code below make sense for forecasting the next four months?

library(dplyr)
library(fable)
library(tsibble)

set.seed(1)
r <- rnorm(36)
r2 <- rnorm(4)
x <- data.frame(index = yearmonth(seq.Date(as.Date("2017-01-01"),
                                           as.Date("2020-04-01"),
                                           "1 month")),
                y = cumprod(c(r, rep(NA, 4))),
                a = c(1.8 * r + rnorm(36), 1.8 * r2 + rnorm(4)),
                b = c(0.5 * r + rnorm(36), 1.5 * r2 + rnorm(4))) %>% 
  as_tsibble()


a1 <- x %>% 
  model(ARIMA(y ~ a + b))

a1 %>% forecast(x[37:40, ])
kkz
  • 306
  • 1
  • 8

1 Answers1

1

No, the forecast function will assume that you want to forecast the months after the training data. If your training data finishes with missing observations, that just means it is forecasting from the last available observation, through the missing period, and then into period after the training data.

Here is some code to do what you want.

x %>%
  filter(index <= yearmonth("2019 Dec")) %>%
  model(ARIMA(y ~ a + b)) %>%
  forecast(new_data = filter(x, index > yearmonth("2019 Dec")))
Rob Hyndman
  • 30,301
  • 7
  • 73
  • 85