nlme::gls() R code on Applied Longitudinal Analysis, 2nd Edition website needs some tweaks

Question

My question is fairly straightforward. When I run the code below, copied directly from the "Applied Longitudinal Analysis" website, here: https://content.sph.harvard.edu/fitzmaur/ala2e/ (It's Chapter 5, Section 5.7),

library(foreign)
ds <- read.dta("tlc.dta")
ds$baseline <- ds$y0
tlclong <- reshape(ds, idvar="id", varying=c("y0","y1","y4","y6"),v.names="y", timevar="time", time=1:4, direction="long")
tlclong <- subset(tlclong, time > 1)
attach(tlclong)

week <- time
week[time==2] <- 1
week[time==3] <- 4
week[time==4] <- 6
time <- time - 1
week.f <- factor(week, c(1,4,6))
change <- y - baseline
cbaseline <- baseline - 26.406

library(nlme)
model <- gls(y ~ I(week.f==1) + I(week.f==4) + I(week.f==6) + I(week.f==1 & trt=="Succimer") + I(week.f==4 & trt=="Succimer") + I(week.f==6 & trt=="Succimer"), corr=corSymm(, form= ~ time | id), weights = varIdent(form = ~ 1 | week.f))

summary(model)

I get output that is completely different in a few very crucial areas.

This is the output I get:

Generalized least squares fit by REML
  Model: y ~ I(week.f == 1) + I(week.f == 4) + I(week.f == 6) + I(week.f ==      1 & trt == "Succimer") + I(week.f == 4 & trt == "Succimer") +      I(week.f == 6 & trt == "Succimer") 
  Data: NULL 
       AIC      BIC    logLik
  2028.922 2076.764 -1001.461

Correlation Structure: General
 Formula: ~time | id 
 Parameter estimate(s):
 Correlation: 
  1 2
2 0  
3 0 0
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | week.f 
 Parameter estimates:
1 4 6 
1 1 1 

Coefficients:
                                         Value Std.Error   t-value p-value
(Intercept)                             23.646  1.002955 23.576329  0.0000
I(week.f == 1)TRUE                       1.014  1.418393  0.714894  0.4752
I(week.f == 4)TRUE                       0.424  1.418393  0.298930  0.7652
I(week.f == 1 & trt == "Succimer")TRUE -11.138  1.418393 -7.852550  0.0000
I(week.f == 4 & trt == "Succimer")TRUE  -8.556  1.418393 -6.032180  0.0000
I(week.f == 6 & trt == "Succimer")TRUE  -2.884  1.418393 -2.033287  0.0429

 Correlation: 
                                       (Intr) I(.==1 I(.==4 I=1&t=" I=4&t="
I(week.f == 1)TRUE                     -0.707                              
I(week.f == 4)TRUE                     -0.707  0.500                       
I(week.f == 1 & trt == "Succimer")TRUE  0.000 -0.500  0.000                
I(week.f == 4 & trt == "Succimer")TRUE  0.000  0.000 -0.500  0.000         
I(week.f == 6 & trt == "Succimer")TRUE -0.707  0.500  0.500  0.000   0.000 

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-2.3494198 -0.6575048 -0.1467858  0.5279214  6.0826597 

Residual standard error: 7.091964 
Degrees of freedom: 300 total; 293 residual

And here is the output shown on the website linked above:

Generalized least squares fit by REML
Model: y ~ I(week.f == 1) + I(week.f == 4) + I(week.f == 6) + I(week.f == 1 & trt == "Succimer") + I(week.f == 4 & trt == "Succimer") + I(week.f == 6 & trt == "Succimer") 

  Data: NULL 
       AIC      BIC    logLik
  2451.990 2519.544 -1208.995

Correlation Structure: General
 Formula: ~time | id 
 Parameter estimate(s):
 Correlation: 
  1     2     3    
2 0.569            
3 0.568 0.775      
4 0.575 0.581 0.580

Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | week.f 
 Parameter estimates:
       0        1        4        6 
1.000000 1.330103 1.374827 1.529615 

Coefficients:
                                            Value Std.Error   t-value p-value
(Intercept)                             26.406000 0.4998908  52.82354  0.0000
I(week.f == 1)TRUE                      -1.644501 0.7824044  -2.10186  0.0362
I(week.f == 4)TRUE                      -2.231356 0.8073811  -2.76370  0.0060
I(week.f == 6)TRUE                      -2.642065 0.8864616  -2.98046  0.0031
I(week.f == 1 & trt == "Succimer")TRUE -11.340998 1.0931205 -10.37488  0.0000
I(week.f == 4 & trt == "Succimer")TRUE  -8.765288 1.1312570  -7.74827  0.0000
I(week.f == 6 & trt == "Succimer")TRUE  -3.119869 1.2507776  -2.49434  0.0130

 Correlation: 
                                       (Intr) I(.==1 I(.==4 I(.==6 I=1&t="
I(week.f == 1)TRUE                     -0.155                             
I(week.f == 4)TRUE                     -0.136  0.674                      
I(week.f == 6)TRUE                     -0.068  0.381  0.380               
I(week.f == 1 & trt == "Succimer")TRUE  0.000 -0.699 -0.467 -0.265        
I(week.f == 4 & trt == "Succimer")TRUE  0.000 -0.466 -0.701 -0.265  0.667 
I(week.f == 6 & trt == "Succimer")TRUE  0.000 -0.263 -0.263 -0.705  0.376 

                                       I=4&t="

I(week.f == 1)TRUE                            
I(week.f == 4)TRUE                            
I(week.f == 6)TRUE                            
I(week.f == 1 & trt == "Succimer")TRUE        
I(week.f == 4 & trt == "Succimer")TRUE        
I(week.f == 6 & trt == "Succimer")TRUE  0.375 

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-2.1636401 -0.7011814 -0.1426534  0.5374840  5.6570302 
 
Residual standard error: 4.998908 
Degrees of freedom: 400 total; 393 residual

Note that

The AIC, BIC, and LogLik values are all different
The correlation output is completely different. My output gives all zeros, and is missing an entire dimension in the correlation matrix.
My variance function parameter estimates are completely different; also missing the 0 output shown on the website.
The estimated coefficients are wildly different in most cases; also my output is completely missing the week.f == 6 results.
The correlation matrix outputted below the coefficients results is completely different.
My output says 300 degrees of freedom total; Fitzmaurice's says 400 total (the rest of that last section of output is obviously different as well).

Also, when I run the code, I get an error saying

Error in glsEstimate(object, control = control) : 
  computed "gls" fit is singular, rank 7

So, to even get any output at all, I had to add this argument to the gls() function:

control = list(singular.ok = TRUE)

So my question is, why is the output so different when I run the code myself? Clearly whoever made the website neglected to copy the code correctly.

If you would like to try running the code yourself, the website linked above has the dataset tlc.dta available for download.

Were did you get `tlc.dta` from? It looks like the data is missing 100 observations. Their model specification is just bad IMHO. See this: https://stats.stackexchange.com/a/70700/11849 — Roland, Oct 14 '21 at 13:34
The tlc.dta data is available for download on the website I linked to. See 'datasets' in the menu: https://content.sph.harvard.edu/fitzmaur/ala2e/ And yes I agree in that I have no idea why the model specification needed to be that way. Not something I've ever done before anyway. I guess maybe that's just what "analysis of response profiles" is all about though. — James Cutler, Oct 14 '21 at 14:02
I just found the answer to my own question. The code on Fitzmaurice's website clearly doesn't match the output shown. I commented out the parts where the data is subsetted to time > 1, and changed the remaining code accordingly (added a " week[time==1] <- 0 ", etc.), and ran the model with just week.f instead of all those I()'s for week.f, and got the exact same output as shown on the website. Case closed I guess. I have no idea why Fitzmaurice's code is wrong though lol. Also, I'm not sure how to flag this issue as solved. — James Cutler, Oct 14 '21 at 14:23

James Cutler · Accepted Answer · 2021-10-14T15:36:10.080

Changing Fitzmaurice's code slightly, to INCLUDE time == 1 (and therefore week == 0), I get the same output as shown on the website:

ds <- foreign::read.dta("tlc.dta")
ds$baseline <- ds$y0
tlclong <- reshape(ds, idvar="id", varying=c("y0","y1","y4","y6"),
                   v.names="y", timevar="time", time=1:4, direction="long")
# tlclong <- subset(tlclong, time > 1)
attach(tlclong)

week <- time
week[time==1] <- 0
week[time==2] <- 1
week[time==3] <- 4
week[time==4] <- 6
time <- time - 1
week.f <- factor(week, c(0,1,4,6))
change <- y - baseline
cbaseline <- baseline - 26.406

library(nlme)
model <- gls(y ~ week.f +
                 I(week.f==1 & trt=="Succimer") +
                 I(week.f==4 & trt=="Succimer") +
                 I(week.f==6 & trt=="Succimer"),
             corr=corSymm(form= ~ time | id),
             weights = varIdent(form = ~ 1 | week.f)
             )
summary(model)

So in conclusion, I should change the wording of my question from "nlme's output is different now" to "Fitzmaurice forgot to put what his code MUST have certainly been originally, in order to get the output shown on his website". My bad.

Have you run your code recently? I am catching this error: ` Error in model.frame.default(formula = ~time + id + week.f + y + trt, : variable lengths differ (found for 'id') ` — user1916067, Oct 31 '22 at 21:10

nlme::gls() R code on Applied Longitudinal Analysis, 2nd Edition website needs some tweaks

1 Answers1