Context first, questions at the bottom.
I have 10 years of daily precipitation data that exhibits an annual seasonality, which I am trying to model using ARMA methods and then forecast. Data here, time series object creation below.
I am aware that common R packages and functions struggle with daily time series. For example, the arima() function of Forecast does not accept frequencies above 350, and ts() does not accept non-integer values for frequencies (both of which would be useful as the average number of days in a year is 365.25).
Apparently, the msts() function of Forecast can accept a non-integer values for the seasonal.perdiods argument, so I have created my time series like so:
the_ts <- msts(data$PRCP, start=c(2007, 10), end=c(2017, 9), seasonal.periods=c(365.25))
Time Series:
Start = c(2007, 10)
End = c(2017, 9)
Frequency = 365
[1] 0.09 0.75 1.63 0.06 0.36 0.63 0.76 0.43 0.13 0.00 0.00 0.02 0.31 1.80 0.03 0.19 0.25 0.01 0.00 0.52 0.01 0.00 0.00 0.00 0.00 ... etc
--
plot(the_ts)
This series is stationary, so it does not need to be differenced.
I then wish to decompose the series, to extract the seasonality and trend, leaving the residuals left over, which if successful, should approximate white noise.
Below is the plotted decomposition. I'v also tried to use decompose() and played with numerous parameter adjustments.
the_ts_decomp = stl(the_ts, s.window = "periodic")
plot(the_ts_decomp)
Apparently, there is seasonality of some type left over in the residuals, as the shape of the seasonal trend in the data is paralleled in the residuals. Let's remove the identified seasonal component and inspect the supposedly deseasonalized data:
the_ts_deseasonal <- seasadj(the_ts_decomp)
plot(the_ts_deseasonal)
Still looks pretty seasonal to me. ACF and PACFs (not pictured) confirm autocorrelation is occurring.
Acf(the_ts_deseasonal, lag.max=1000)
Pacf(the_ts_deseasonal, lag.max=1000)
Apparently it is possible to account for the seasonal component of the series when forecasting, by passing a fourier transform into the ARMA model as an exogenous covariate, as described by the author of the Forecast package here and here, but I am unclear on exactly how this relates to the need to decompose and deseasonalize the series before fitting a model.
I am able to produce the following forecast using that method based on the above seemingly insufficiently decomposed data. The forecast doesn't seem like a home run, and the residuals confirm something is off:
the_ts.fit <- auto.arima(the_ts, seasonal=FALSE, xreg=fourier(the_ts, K=5))
plot(forecast(the_ts.fit, h=365, xreg=fourier(the_ts, K=5, h=365)))
tsdisplay(residuals(the_ts.fit), lag.max=1000)
I don't understand Hyndman's exreg=fourier solution very well and it could be that I am missing how to correctly apply it to my circumstance.
Question 1: Don't I need to be able to decompose the data down to trendless, seasonless white noise, in order to reconstruct it for the forecast? What about when using the exreg=fourier solution?
Question 2: Why might the above code be failing to extract the seasonal component of the series, and how could I fix it?
Question 3: What package, function, or technique can use to specify an annual seasonality of 365.25?