2

I am building a model using the mgcv package in r. The data has serial measures (data collected during scans 15 minutes apart in time, but discontinuously, e.g. there might be 5 consecutive scans on one day, and then none until the next day, etc.). The model has a binomial response, a random effect of day, a fixed effect, and three smooth effects. My understanding is that REML is the best fitting method for binomial models, but that this method cannot be specified using the gamm function for a binomial model. Thus, I am using the gam function, to allow for the use of REML fitting. When I fit the model, I am left with residual autocorrelation at a lag of 2 (i.e. at 30 minutes), assessed using ACF and PACF plots.

So, we wanted to include an autocorrelation structure in the model, but my understanding is that only the gamm function and not the gam function allows for the inclusion of such structures. I am wondering if there is anything I am missing and/or if there is a way to deal with autocorrelation with a binomial response variable in a GAMM built in mgcv.

My current model structure looks like:

gam(Response ~
                    s(Day, bs = "re") +
                    s(SmoothVar1, bs = "cs") +
                    s(SmoothVar2, bs = "cs") + 
                    s(SmoothVar3, bs = "cs") + 
                    as.factor(FixedVar),
                  family=binomial(link="logit"), method = "REML",
                  data = dat)

I tried thinning my data (using only every 3rd data point from consecutive scans), but found this overly restrictive to allow effects to be detected due to my relatively small sample size (only 42 data points left after thinning). I also tried using the prior value of the binomial response variable as a factor in the model to account for the autocorrelation. This did appear to resolve the residual autocorrelation (based on the updated ACF/PACF plots), but it doesn't feel like the most elegant way to do so and I worry this added variable might be adjusting for more than just the autocorrelation (though it was not collinear with the other explanatory variables; VIF < 2).

CKon
  • 23
  • 5
  • Does this answer your question? [Autocorrelation in Generalized Additive Models (GAM)](https://stackoverflow.com/questions/47586110/autocorrelation-in-generalized-additive-models-gam) – Ric Oct 28 '22 at 23:04
  • I don't believe so - for the reasons explained in my question, I don't believe the gamm function solution proposed is appropriate for my binomial model, and the other proposed solution (bam) appears to be for large datasets, which mine is not. – CKon Oct 29 '22 at 01:15

1 Answers1

1

I would use bam() for this. You don't need to have big data to fit a with bam(), you just loose some of the guarantees about convergence that you get with gam(). bam() will fit a GEE-like model with an AR(1) working correlation matrix, but you need to specify the AR parameter via rho. This only works for non-Gaussian families if you also set discrete = TRUE when fitting the model.

You could use gamm() with family = binomial() but this uses PQL to estimate the GLMM version of the GAMM and if your binomial counts are low this method isn't very good.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Thanks for the suggestion Gavin. I tried it out, but unfortunately it looks like it won't work for my binomial model. I get the warning "AR1 parameter rho unused with generalized model" – CKon Oct 31 '22 at 16:55
  • You need to set `discrete = TRUE` in the model; sorry I should have mentioned that in my answer (I remember intending to do exactly that but it must have slipped my mind or I got distracted) – Gavin Simpson Oct 31 '22 at 17:01
  • I see, thanks! So, I was able to get it to run, but I am wondering what are the differences with using fREML instead of REML (which was also required to get it to run) and setting discrete = T? When I just change those settings (without adding in the autocorrelation yet) my model results seem fairly similar in terms of parameter estimates and AICc values, but when I set discrete = T the first level of my fixed effect goes from being absorbed by the intercept to having it's own estimate. – CKon Nov 01 '22 at 17:52
  • best to read `?bam()` and accompanying papers cited in that help file. You need `method = "fREML"` and that's the default in `bam()`. `discrete = TRUE` uses the fastest algorithm available in {mgcv} for fitting GAMs to large data and it works by not only chunking the formation of model terms through a QR decomposition, but also by discretising the covariates to coarser values rather than using all of the values to their full record precision when setting up the model matrix and bases. – Gavin Simpson Nov 01 '22 at 20:15
  • Thanks for the explanation Gavin, I had read the help post but not the papers in it. So I'll dig in there for more details. – CKon Nov 02 '22 at 21:43