0

I have some fish data that I would like to make predictions for into the future. I would like to predict 'fishcount' based on two variables (airtemp_f and watertemp_f). I, ideally would like to use the R package forecast to predict fishcount 2 or 3 period numbers ahead, however, I don't know how to include airtemp_f and watertemp_f into the model. Below is a small dataset:

 library(forecast)
 library(ggfortify)
 library(ggplot2)
 library(xts)

fish <- structure(list(year = c(2011, 2011, 2011, 2011, 2011, 2011, 2011, 
2011, 2011, 2011, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 
2012, 2012, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 
2011, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012
), period = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 
7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 
7, 8, 9, 10), district = c("221", "221", "221", "221", "221", 
"221", "221", "221", "221", "221", "221", "221", "221", "221", 
"221", "221", "221", "221", "221", "221", "222", "222", "222", 
"222", "222", "222", "222", "222", "222", "222", "222", "222", 
"222", "222", "222", "222", "222", "222", "222", "222"), date = structure(c(15158, 
15160, 15162, 15164, 15165, 15167, 15168, 15169, 15172, 15174, 
15512, 15519, 15525, 15529, 15531, 15533, 15535, 15536, 15537, 
15538, 15187, 15190, 15192, 15194, 15197, 15199, 15201, 15203, 
15205, 15207, 15903, 15905, 15908, 15911, 15914, 15916, 15918, 
15919, 15920, 15921), class = "Date"), fishcount = c(2101, 16031, 
13498, 7024, 42569, 36288, 101565, 204305, 235376, 39851, 14879, 
24512, 97382, 109688, 164938, 182892, 115047, 203842, 247499, 
33879, 89551, 25576, 61377, 4517, 0, 11739, 22318, 69831, 2895, 
16720, 349586, 136904, 365634, 369484, 1054650, 1009362, 1080558, 
671706, 631603, 1007896), airtemp_f = c(54.95, 56.15, 54.1325, 
53.3975, 54.1775, 53.945, 54.62, 54.0773913043478, 56.63, 54.7625, 
50.8025, 49.6625, 49.8575, 49.3775, 49.55, 49.49, 50.0525, 49.775, 
49.6775, 50.795, 57.8675, 53.9225, 53.5475, 51.905, 58.8875, 
55.0475, 54.185, 56.24, 53.915, 54.1325, 56.8154545454545, 58.6021052631579, 
60.5381818181818, 58.084347826087, 57.6930434782609, 56.9808695652174, 
59.3109090909091, 57.8136363636364, 174.548, 56.1623529411765
), watertemp_f = c(56.735, 57.2225, 56.4125, 55.5275, 54.6575, 
54.7625, 54.4475, 53.7095652173913, 55.6925, 53.09, 50, 51.635, 
52.61, 51.0425, 51.095, 50.63, 50.825, 51.065, 50.8625, 52.25, 
59.7425, 55.9325, 55.67, 54.6575, 55.2575, 54.8375, 55.7525, 
56.78, 55.985, 55.595, 59.09, 59.4263157894737, 59.7690909090909, 
58.7417391304348, 59.7513043478261, 60.424347826087, 61.2172727272727, 
59.9163636363636, 58.676, 58.2588235294118)), row.names = c(NA, 
-40L), class = c("tbl_df", "tbl", "data.frame"))

 head(fish)
year period district date       fishcount airtemp_f watertemp_f
  <dbl>  <dbl> <chr>    <date>         <dbl>     <dbl>       <dbl>
1  2011      1 221      2011-07-03      2101      55.0        56.7
2  2011      2 221      2011-07-05     16031      56.2        57.2
3  2011      3 221      2011-07-07     13498      54.1        56.4
4  2011      4 221      2011-07-09      7024      53.4        55.5
5  2011      5 221      2011-07-10     42569      54.2        54.7
6  2011      6 221      2011-07-12     36288      53.9        54.8

This is my attempt:

#convert fish to xts or ts?
count <- as.xts(fish$fishcount,order.by=seq(as.Date("2011-07-03"),by=2,len=40))
d.arima <- auto.arima(count)
d.forecast <- forecast(d.arima, level = c(95), h = 3)
d.forecast

Question: How do I include airtemp_f and watertemp_f into the model to predict by period and how do I plot it in ggplot?

Thanks for any help beforehand.

Salvador
  • 1,229
  • 1
  • 11
  • 19

1 Answers1

0

How do I include airtemp_f and watertemp_f into the model

the function has a parameter xreg:

d.arima <- auto.arima(
  count, 
  xreg = as.matrix(fish[, c("airtemp_f", "watertemp_f")])
)

predict by period

Not familiar with the package, but h seems to become useless once you have xregs in the prediction, it'll use the # of rows in xreg instead.

I assume it forecasts from the end of the training period so let's create a function to force to start from the beginning every time:

fcst <- function(airtemp_f, watertemp_f, h = 3) {
  tibble::rownames_to_column(
    as.data.frame(forecast(
      d.arima, 
      level = 95, 
      xreg = cbind(airtemp_f = rep(airtemp_f, h), watertemp_f = watertemp_f)
    )),
    var = "period"
  )
}

how do I plot it in ggplot?

There are many ways, but you'll have to decide some slices of the regressors you are using, e.g.:

# Get the median and 95% interval of temperatures.
tidyr::crossing(
  airtemp_f = quantile(fish$airtemp_f, c(0.05, 0.5, 0.95)),
  watertemp_f = quantile(fish$watertemp_f, c(0.05, 0.5, 0.95)),
) %>%
  # Run the forecast with out defined function.
  dplyr::mutate(fcst = purrr::map2(airtemp_f, watertemp_f, fcst)) %>%
  # Flatten the data frame we got from the foreacst.
  tidyr::unnest(fcst) %>%
  # Plot the results in facets,
  ggplot(aes(period, group = 1)) +
  facet_grid(airtemp_f ~ watertemp_f) +
  geom_ribbon(aes(ymin = `Lo 95`, ymax = `Hi 95`), alpha = 0.5) +
  geom_line(aes(y = `Point Forecast`)
Robin Gertenbach
  • 10,316
  • 3
  • 25
  • 37
  • @Robin-- Thanks for your suggestion. I am going through your code right now. So, to better understand it, is this predicting water and air temps? What I want is to predict is fishcount overtime. The date variable should be on the x axis and fishcount on the y axis. – Salvador Aug 24 '21 at 22:16