Converting Repeated Measures mixed model formula from SAS to R

Question

There are several questions and posts about mixed models for more complex experimental designs, so I thought this more simple model would help other beginners in this process as well as I.

So, my question is I would like to formulate a repeated measures ancova in R from sas proc mixed procedure:

proc mixed data=df1;
FitStatistics=akaike
class GROUP person day;
model Y = GROUP X1 / solution alpha=.1 cl;
repeated / type=cs subject=person group=GROUP;
lsmeans GROUP;
run;

Here is the SAS output using the data created in R (below):

.           Effect       panel    Estimate       Error      DF    t Value    Pr > |t|     Alpha       Lower       Upper
            Intercept              -9.8693      251.04       7      -0.04      0.9697       0.1     -485.49      465.75
            panel        1         -247.17      112.86       7      -2.19      0.0647       0.1     -460.99    -33.3510
            panel        2               0           .       .        .         .             .           .           .
            X1                     20.4125     10.0228       7       2.04      0.0811       0.1      1.4235     39.4016

Below is how I formulated the model in R using 'nlme' package, but am not getting similar coefficient estimates:

## create reproducible example fake panel data set:
set.seed(94); subject.id = abs(round(rnorm(10)*10000,0))

set.seed(99); sds = rnorm(10,15,5);means = 1:10*runif(10,7,13);trends = runif(10,0.5,2.5)

this = NULL; set.seed(98)
for(i in 1:10) { this = c(this,rnorm(6, mean = means[i], sd = sds[i])*trends[i]*1:6)}
set.seed(97)
that = sort(rep(rnorm(10,mean = 20, sd = 3),6))

df1 = data.frame(day = rep(1:6,10), GROUP = c(rep('TEST',30),rep('CONTROL',30)),
                 Y = this,
                 X1 = that,
                 person = sort(rep(subject.id,6)))

## use package nlme
require(nlme)

## run repeated measures mixed model using compound symmetry covariance structure:
summary(lme(Y ~ GROUP + X1, random = ~ +1 | person,
            correlation=corCompSymm(form=~day|person), na.action = na.exclude,
            data = df1,method='REML'))

Now, the output from R, which I now realize is similar to the output from lm():

                Value Std.Error DF    t-value p-value
(Intercept) -626.1622  527.9890 50 -1.1859379  0.2413
GROUPTEST   -101.3647  156.2940  7 -0.6485518  0.5373
X1            47.0919   22.6698  7  2.0772934  0.0764

I believe I'm close as to the specification, but not sure what piece I'm missing to make the results match (within reason..). Any help would be appreciated!

UPDATE: Using the code in the answer below, the R output becomes:

> summary(model2)

Scroll to bottom for the parameter estimates -- look! identical to SAS.

Linear mixed-effects model fit by REML
 Data: df1 
      AIC      BIC   logLik
  776.942 793.2864 -380.471

Random effects:
 Formula: ~GROUP - 1 | person
 Structure: Diagonal
        GROUPCONTROL GROUPTEST Residual
StdDev:      184.692  14.56864 93.28885

Correlation Structure: Compound symmetry
 Formula: ~day | person 
 Parameter estimate(s):
         Rho 
-0.009929987 
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | GROUP 
 Parameter estimates:
    TEST  CONTROL 
1.000000 3.068837

Fixed effects: Y ~ GROUP + X1 

                Value Std.Error DF    t-value p-value
(Intercept)   -9.8706 251.04678 50 -0.0393178  0.9688
GROUPTEST   -247.1712 112.85945  7 -2.1900795  0.0647
X1            20.4126  10.02292  7  2.0365914  0.0811

What do you mean about not getting similar results? Do you mean that there is information that's missing, or that you're getting different estimates? If the latter, are you sure that the input data is the same? — David Robinson, Aug 05 '12 at 20:22
I'm getting different estimates. I have indeed checked that the input data is identical also, i.e. df1 in SAS = df1 in R. — baha-kev, Aug 05 '12 at 20:26
might it just be a difference in contrasts of the fixed effects? i.e. `contrasts(df1$GROUP) <- contr.SAS(2)` ? — Ben Bolker, Aug 06 '12 at 01:07
Hi @BenBolker! Good to see you noticing this thread. I'll be curious if you agree with my assessment below or not. I think this is trickier than the OP hoped, but it would be nice if I was wrong. — Aaron left Stack Overflow, Aug 06 '12 at 01:38
@baha-kev: If you add the statistical information and ask it more in terms of what model is appropriate, this would be a great question for stats.stackexchange. — Aaron left Stack Overflow, Aug 06 '12 at 01:56
Hi Aaron, once I digest this I will see I can formulate the problem in an interesting and instructive manner to elicit thoughts on that site. — baha-kev, Aug 08 '12 at 00:55
@baha-kev: What happens if you remove the correlation part of the R code? I'm pretty sure it's unnecessary. Something is definitely unnecessary though, as the R model has 5 variance parameters and the SAS model has 4. Also, can you include the SAS variance components too? — Aaron left Stack Overflow, Aug 08 '12 at 03:26

Triad sou. · Accepted Answer · 2012-08-08T11:12:17.050

5

Please try below:

model1 <- lme(
  Y ~ GROUP + X1,
  random = ~ GROUP | person,
  correlation = corCompSymm(form = ~ day | person),
  na.action = na.exclude, data = df1, method = "REML"
)
summary(model1)

I think random = ~ groupvar | subjvar option with R lme provides similar result of repeated / subject = subjvar group = groupvar option with SAS/MIXED in this case.

Edit:

SAS/MIXED

SAS/MIXED covariance matrix

R (a revised model2)

model2 <- lme(
  Y ~ GROUP + X1,
  random = list(person = pdDiag(form = ~ GROUP - 1)),
  correlation = corCompSymm(form = ~ day | person),
  weights = varIdent(form = ~ 1 | GROUP),
  na.action = na.exclude, data = df1, method = "REML"
)
summary(model2)

R covariance matrix

So, I think these covariance structures are very similar (σ_g1 = τ_g² + σ₁).

Edit 2:

Covariate estimates (SAS/MIXED):

Variance            person          GROUP TEST        8789.23
CS                  person          GROUP TEST         125.79
Variance            person          GROUP CONTROL       82775
CS                  person          GROUP CONTROL       33297

So

TEST group diagonal element
  = 125.79 + 8789.23
  = 8915.02
CONTROL group diagonal element
  = 33297 + 82775
  = 116072

where diagonal element = σ_k1 + σ_k².

Covariate estimates (R lme):

Random effects:
 Formula: ~GROUP - 1 | person
 Structure: Diagonal
        GROUP1TEST GROUP2CONTROL Residual
StdDev:   14.56864       184.692 93.28885

Correlation Structure: Compound symmetry
 Formula: ~day | person 
 Parameter estimate(s):
         Rho 
-0.009929987 
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | GROUP 
 Parameter estimates:
   1TEST 2CONTROL 
1.000000 3.068837

So

TEST group diagonal element
  = 14.56864^2 + (3.068837^0.5 * 93.28885 * -0.009929987) + 93.28885^2
  = 8913.432
CONTROL group diagonal element
  = 184.692^2  + (3.068837^0.5 * 93.28885 * -0.009929987) + (3.068837 * 93.28885)^2
  = 116070.5

where diagonal element = τ_g² + σ₁ + σ_g².

edited Aug 08 '12 at 11:12

answered Aug 06 '12 at 00:33

Triad sou.

2,969
3
23
27

I'm pretty sure that `random = ~ GROUP | person` doesn't change anything from the original code because each person is only in one group. What that syntax would do is allow the covariance between levels of group to differ within an individual. – Aaron left Stack Overflow Aug 06 '12 at 01:36
I still think the random effect isn't right. `random = list(person = pdDiag(form = ~ GROUP - 1))` is still allowing the covariance between levels of group to differ within an individual, but is forcing them not to be correlated. – Aaron left Stack Overflow Aug 06 '12 at 17:15
Also, since R actually parameterizes the model in terms of the correlation and the variances, it's hard to see how your matrix matches the R code you wrote. Not that it's necessarily wrong, but it would help if you used R's parameterization to explain. – Aaron left Stack Overflow Aug 06 '12 at 17:17
This darn near perfectly matches the proc mixed output so I will mark as answered. Thanks Triad! – baha-kev Aug 08 '12 at 00:51
@baha-kev: That would be cool! But when I ran it I couldn't see how the variance terms agreed; can you share your results by editing the original post? – Aaron left Stack Overflow Aug 08 '12 at 01:04
Updated the question with reproducible output -- is that what you are looking for? – baha-kev Aug 08 '12 at 01:24
@Triadsou: I was both right and wrong about what the `random = list(person = pdDiag(form = ~ GROUP - 1))` line does; it is fitting a variance matrix for the levels of GROUP, in this case, forced to be diagonal, but it isn't necessarily for levels of group within an individual, it works just fine when each individual is only part of one group too. – Aaron left Stack Overflow Aug 08 '12 at 03:31
So the random line gives different variances to the random effects for each group, and the weights line gives different variances to the residual variance for each group; that should be enough to model a separate compound symmetry structure for each group. Are you sure the correlation line is necessary? – Aaron left Stack Overflow Aug 08 '12 at 03:32
I only tried to make an equivalent R script for SAS/MIXED `repeated / subject = subjvar group = groupvar` statment. I agree that the correlation line is not necessary. But, it was failed (gave a different result) for highly correlated data without the correlation line. Further investigations might be needed. – Triad sou. Aug 08 '12 at 04:08
Probably, the model without the correlation line cannot handle a negative covariance, because τ_g^2 > 0. Thanks, @Aaron. – Triad sou. Aug 08 '12 at 11:03
Thanks for the additional info, @Triadsou. I agree that the model without the correlation line couldn't handle a negative covariance; it certainly could be that adding that line allows that somehow. Also, are you sure the correlation term should be included in your variance calculations? – Aaron left Stack Overflow Aug 08 '12 at 12:02
As indicated above, the correlation term have a influence on fixed effect estimates in highly negative correlated data. So, in order to handle models that have heterogeneous covariance structures like this, the correlation term should be included. Of course, it might not be the optimal way. It is a limitation of the `nlme::lme`. – Triad sou. Aug 08 '12 at 12:33

score 4 · Answer 2 · answered Aug 06 '12 at 01:34

Oooh, this is going to be a tricky one, and if it's even possible using standard nlme functions, is going to take some serious study of Pinheiro/Bates.

Before you spend the time doing that though, you should make absolutely sure that this is exact model you need. Perhaps there's something else that might fit the story of your data better. Or maybe there's something R can do more easily that is just as good, but not quite the same.

First, here's my take on what you're doing in SAS with this line:

repeated / type=cs subject=person group=GROUP;

This type=cs subject=person is inducing correlation between all the measurements on the same person, and that correlation is the same for all pairs of days. The group=GROUP is allowing the correlation for each group to be different.

In contrast, here's my take on what your R code is doing:

random = ~ +1 | person,
correlation=corCompSymm(form=~day|person)

This code is actually adding almost the same effect in two different ways; the random line is adding a random effect for each person, and the correlation line is inducing correlation between all the measurements on the same person. However, these two things are almost identical; if the correlation is positive, you get the exact same result by including either of them. I'm not sure what happens when you include both, but I do know that only one is necessary. Regardless, this code has the same correlation for all individuals, it's not allowing each group to have their own correlation.

To let each group have their own correlation, I think you have to build a more complicated correlation structure up out of two different pieces; I've never done this but I'm pretty sure I remember Pinheiro/Bates doing it.

You might consider instead adding a random effect for person and then letting the variance be different for the different groups with weights=varIdent(form=~1|group) (from memory, check my syntax, please). This won't quite be the same but tells a similar story. The story in SAS is that the measurements on some individuals are more correlated than the measurements on other individuals. Thinking about what that means, the measurements for individuals with higher correlation will be closer together than the measurements for individuals with lower correlation. In contrast, the story in R is that the variability of measurements within individuals varies; thinking about that, measurements with higher variability with have lower correlation. So they do tell similar stories, but come at it from opposite sides.

It is even possible (but I would be surprised) that these two models end up being different parameterizations of the same thing. My intuition is that the overall measurement variability will be different in some way. But even if they aren't the same thing, it would be worth writing out the parameterizations just to be sure you understand them and to make sure that they are appropriately describing the story of your data.

A final thought though, changing the the correlation structure usually doesn't affect the estimates of the fixed effects, so if that's what's different, there may be something else going too. — Aaron left Stack Overflow, Aug 06 '12 at 01:54
Your answer sounds reasonable to me, but I really think we need to hear/see more from the OP about what the output of each program looks like (I have access to SAS, but not conveniently) and where the key differences are ... — Ben Bolker, Aug 06 '12 at 04:01
Thanks, @BenBolker. I haven't tried running the OP's code either; I have access to SAS but not conveniently from home. — Aaron left Stack Overflow, Aug 06 '12 at 04:36
One thought -- I get errors when I delete the `random=` term while keeping the `correlation=` term. If, as you say, they are redundant then I should be able to remove one. — baha-kev, Aug 08 '12 at 00:58
@baha-kev: Be sure to switch to `gls` when you remove the random effects. — Aaron left Stack Overflow, Aug 08 '12 at 01:03

Converting Repeated Measures mixed model formula from SAS to R

2 Answers2

Linked