2

I'm following the R code (here) in Applied Longitudinal Analysis by Fitzmaurice, Laird and Ware to conduct a penalized spline regression analysis using nlme::lme. My actual application will have many more random coefficients, so I plan to use lme4::lmer. I'm having trouble translating the random spline terms bf1-bf22 in the linked code to one that is compatible in lmer. If I understand the lme implementation, random intercepts are specified for the piecewise splines (variables with prefix bf). I (naively) tried adding s(time,k=22,bs='re') in the RHS of the formula in lmer (code below shows gamm4, which uses lme4 functions), but that does not seem to work.

The data can be downloaded from here and the code is reproduced below.

library(foreign)

ds <- read.dta("progesterone.dta") #Wherever the file is stored)
attach(ds)

bf1 <- (time+7)*I(time > -7)
bf2 <- (time+6)*I(time > -6)
bf3 <- (time+5)*I(time > -5)
bf4 <- (time+4)*I(time > -4)
bf5 <- (time+3)*I(time > -3)
bf6 <- (time+2)*I(time > -2)
bf7 <- (time+1)*I(time > -1)
bf8 <- (time)*I(time > 0)
bf9 <- (time-1)*I(time > 1)
bf10 <- (time-2)*I(time > 2)
bf11 <- (time-3)*I(time > 3)
bf12 <- (time-4)*I(time > 4)
bf13 <- (time-5)*I(time > 5)
bf14 <- (time-6)*I(time > 6)
bf15 <- (time-7)*I(time > 7)
bf16 <- (time-8)*I(time > 8)
bf17 <- (time-9)*I(time > 9)
bf18 <- (time-10)*I(time > 10)
bf19 <- (time-11)*I(time > 11)
bf20 <- (time-12)*I(time > 12)
bf21 <- (time-13)*I(time > 13)
bf22 <- (time-14)*I(time > 14)


Const <- factor(rep(1,length(logp)))

group.time <- group*time
group.bf15 <- group*bf15
require(nlme)
model_lme <- lme(logp ~ time + group + group.time + group.bf15, 
              random=list(Const=pdIdent(~-1 + bf1 + bf2 + bf3 + bf4 + bf5 + bf6 + 
                                        bf7 + bf8 + bf9 + bf10 + bf11 + bf12 + bf13 + bf14 + bf15 + bf16 + 
                                        bf17 + bf18 + bf19 + bf20 + bf21 + bf22), 
                          id=pdSymm(~time)))


require(gamm4)
require(mgcv)
model_lmer <- gamm4(logp ~ time + group + group.time + group.bf15 +
                   s(time,k=22,bs="re") + s(time,id,k=22,bs="re"))
Progman
  • 16,827
  • 6
  • 33
  • 48
stats134711
  • 624
  • 1
  • 5
  • 15

1 Answers1

3

load packages

library(foreign)
library(nlme)
library(gamm4)
library(mgcv)
library(ggplot2); theme_set(theme_bw())
library(cowplot)

process data

I streamlined the process of creating the individual linear spline basis functions ... (attach() is almost never a good idea ...)

ds <- read.dta("progesterone.dta") #Wherever the file is stored)
dss <- within(ds, {
    for (i in (-7):14) {
        assign(paste0("bf",i+8), (time+i)*as.numeric(time > i))
    }
})
matplot(dss[dss$id==1,5:26], type ="l")
dss$Const <- factor(rep(1,nrow(dss)))

fit lme model with linear splines

Again, streamlining (using reformulate rather than writing out all of the bf1 + bf2 + ... terms; using time*group rather than constructing all the dummies/interactions by hand).

model_lme <- lme(logp ~ time*group + I(group*bf15), 
                 random=list(Const=pdIdent(
                                 reformulate(c("-1",
                                               paste0("bf",1:22)))),
                             id=pdSymm(~time)),
                 data = dss)

gamm4 version

ds$bf15 <- dss$bf15
model_gamm4 <- gamm4(logp ~time*group + I(group*bf15) +
                         s(time, bs = "cs"),
                     ## need parentheses!
                     random = ~(time|id),
                     data = ds)

A few things to note here: (1) the collection of spline terms is represented by s(time, bs = "cs") (where "cs" stands for "cubic spline": see ?smooth.construct.cs.smooth.spec [substitute for 'cs' to see the help for other choices of smooths]); (2) the other RE term in the original model is a random-slopes term with respect to time (id = pdSymm(~time)), which is represented here as a separate random = (time|id); this component is handled on the lmer-fitting side, not by constructing an mgcv basis for it.

compare predictions

dss2 <- data.frame(ds, pred1 = predict(model_lme), pred2 = predict(model_gamm4$mer))
gg0 <- ggplot(dss, aes(time, logp, colour = factor(group))) + geom_point()
plot_grid(gg0 + geom_line(data=dss2, aes(group=id, y = pred1)) + 
                 labs(title = "lme (linear)"),
   gg0 + geom_line(data=dss2, aes(group=id, y = pred2)) + 
                 labs(title = "gamm4 (cubic)"))

enter image description here

I also tried this in mgcv:

model_gam <- gam(logp ~time*group + I(group*bf15) +
                     s(time, bs = "cs") + s(id, time, bs = "re") +
                     s(id, bs = "re"),
                 data = dss, method = "REML")

The only obvious difference is that the slope and intercept REs are specified independently rather than being correlated, but the predicted values are quite different; I don't know if that's something about the model fit, or something different about how the random effects are treated during prediction ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thanks! If I wanted to use `lmer` instead of `gamm4` (e.g. manually specify splines like in the text) how would the syntax look? – stats134711 Nov 16 '22 at 20:18
  • 1
    It's actually a little tricky in `lme4` at present, as `lme4` doesn't offer a "homogeneous diagonal variance" customization option as `lme` does. Could be hacked. Might be easier in `glmmTMB`, but would be not-entirely trivial in any case. Probably a separate question, and you might need to have a pretty good reason (I know `gamm4` is clunky; I have some code to make working with it a *little* bit less painful here: https://github.com/bbolker/mmd_utils/blob/master/gamm4_utils.R ) – Ben Bolker Nov 16 '22 at 20:25
  • Wanted to clarify a term in the `gamm4` model - is the smoothing done by `s(time,bs="cs")` doing the same thing as the random effects for the sum of `bf` terms in the `Const=pdIdent(...)` call in `lme`? – stats134711 Nov 21 '22 at 16:18
  • 1
    Approximately - although the penalty term is different ("cs" sets things up so that the penalty term is proportional to the integrated second derivative of the curve, rather than to the sum of squared magnitudes of the individual components ...) – Ben Bolker Nov 21 '22 at 16:23