6

I have a time series dataset for water temperature, air temperature, and flow rate in a river. I have created a GAM model to predict water temperature based on air temp and flow. However, I have not accounted for the autocorrelation in the datasets. Each data point within the predictors and dependent variable are not independent (i.e air temperature on day 2 is not independent of air temperature on day 1).

Can someone help me with the appropriate code to include some form of autocorrelation measure (AR1?) within my model. As I understand it, I need to use the gamm() function instead of the gam() function?

My current model looks like this:

model <- gam(W.T.Mean ~ s(T.Mean) +s(Discharge), data = Pre_regulation_temp)

  • W.T.Mean is mean daily water temperature.
  • T.Mean is mean daily air temperature.
  • Discharge is mean daily flow.

Thanks in advance.

Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30
Daniel
  • 63
  • 1
  • 4

1 Answers1

6

You actually have several choices

  1. gamm() with correlation = corAR1(form = ~ time) (where time is the variable giving you the ordering in time of the evenly spaced observations
  2. bam() and specify a known value of rho, the AR(1) parameter.

That said, the issue for inference is that conditional upon the estimated model (i.e. effects of covariates) the response is independent and identically distributed. Put another way, we expect the residuals of the model to be independent (not autocorrelated). If the instantaneous (smooth) effect of air temperature on water temperature is sufficient to leave the model residuals independent then you do not necessarily need to do anything to correct the model.

However, if the estimated smooth effect of air temperature is quite wiggly, that might suggest that the estimated effect is being affected by autocorrelation in the data. I would expect a relatively simple relationship between air and water temperature, with saturating effects at both low and high ends & mdash; you can't make water go less than 0 but air temps can go well below, & likewise at the high end you don't get the same increase in water temperature for a unit increase in air temp. So check the estimated smooth to see if the effect is more complex than you would expect. If it is, you should try fitting with gamm() and see how that changes the estimated smooth. If it doesn't make much difference, then I'd go back to my original gam() and look at the autocorrelation function of the model residuals and if that shows a problem with autocorrelation, then you either need to correct it by adding terms to your gam() model or switch back to gamm() with correlation = .... specified and use that for inference.

Other, more complex option, is to use the brms package, which can also estimate models with AR or ARMA correlation structrues.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Do I understand right that you can create ARMA timeseries models with MGCV? – hhh Sep 03 '18 at 19:49
  • 1
    @hhh no; you can include an ARMA process for the residuals of a regression model using *mgcv* (via *nlme*). `bam()` allows one to specify an AR(1) process for the residuals but the value of rho, which is typically estimated in time series modelling functions, needs to be supplied. – Gavin Simpson Sep 04 '18 at 15:46
  • I tried BRMS on ARMA correlation structures but it is only supported by Gaussian priors, not with distributions such as lognormal. So the next step would be to create the model with STAN directly or what about rstanarm? – hhh Sep 06 '18 at 11:49
  • 1
    @hhh can't you write a model for the mean and add the ARMA to that and then at the data layer say that Y ~ LN(mu, ...). Might not be possible with BRMS but you could probably set up the STAN code using it, and then modify just the part for the ARMA and where it applies? – Gavin Simpson Sep 06 '18 at 15:20