5

I try to reproduce some SAS output using R. the method I want to reproduce is :

Two way anaysis of variance with repeated measures on factor time using mixed models (covariance matrix = CS, Estimation method = REML)

Everything looks fine exept AIC... I would like to know if someone know the AIC formula used by SAS...

the main SAS output are :

anova table

AIC and co

The Anova table is the same, but not the AIC (and BIC), event if the loglik is the same.

thats what I did with R :

library(nlme)
dataset_melt <- structure(list(Groupe = c("A", "A", "A", "A", "A", "B", "B", 
"B", "B", "B", "C", "C", "C", "C", "C", "A", "A", "A", "A", "A", 
"B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "A", "A", "A", 
"A", "A", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "A", 
"A", "A", "A", "A", "B", "B", "B", "B", "B", "C", "C", "C", "C", 
"C", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "C", "C", 
"C", "C", "C"), ID = c("01/001", "01/002", "01/003", "01/004", 
"01/005", "02/001", "02/002", "02/003", "02/004", "02/005", "03/001", 
"03/002", "03/003", "03/004", "03/005", "01/001", "01/002", "01/003", 
"01/004", "01/005", "02/001", "02/002", "02/003", "02/004", "02/005", 
"03/001", "03/002", "03/003", "03/004", "03/005", "01/001", "01/002", 
"01/003", "01/004", "01/005", "02/001", "02/002", "02/003", "02/004", 
"02/005", "03/001", "03/002", "03/003", "03/004", "03/005", "01/001", 
"01/002", "01/003", "01/004", "01/005", "02/001", "02/002", "02/003", 
"02/004", "02/005", "03/001", "03/002", "03/003", "03/004", "03/005", 
"01/001", "01/002", "01/003", "01/004", "01/005", "02/001", "02/002", 
"02/003", "02/004", "02/005", "03/001", "03/002", "03/003", "03/004", 
"03/005"), temps = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L), .Label = c("T0", "T1", "T2", "T3", "T4"), class = "factor"), 
    value = c(29.4, 21, 23.4, 26.2, 28.5, 27.8, 27.2, 20.6, 20.2, 
    25.3, 26.2, 29.2, 27.1, 23.1, 20.6, 22.9, 29.6, 20.9, 25.2, 
    25, 26, 26.7, 25.1, 21, 28.2, 23.4, 27.1, 29.8, 22.2, 26.6, 
    29.9, 29.1, 23.4, 22.6, 25.7, 24.5, 29.6, 21.5, 28.9, 20.1, 
    26.5, 23.4, 24.9, 25.3, 25, 27.4, 29.5, 24.6, 27.4, 24.6, 
    21.3, 23.6, 22.8, 23.6, 20.6, 26.5, 29.2, 20.6, 25.7, 29.1, 
    23.7, 24.3, 28.7, 21.9, 23.7, 29.8, 27.1, 28.7, 28.3, 20.4, 
    28.7, 20.3, 22.8, 23.4, 21.5)), row.names = c(NA, -75L), .Names = c("Groupe", 
"ID", "temps", "value"), class = "data.frame")

options(contrasts=c("contr.SAS","contr.poly"))
mon_lme <- lme(value ~ Groupe *temps, random = ~ +1 | ID,
        correlation=corCompSymm(form=~temps|ID), #na.action = na.exclude,
        data = dataset_melt,method='REML')
anova(mon_lme) # quite same as SAS
             numDF denDF  F-value p-value
(Intercept)      1    48 6040.352  <.0001
Groupe           2    12    0.495  0.6215
temps            4    48    0.057  0.9938
Groupe:temps     8    48    1.175  0.3334
summary(mon_lme)$AIC
# 363.938
summary(mon_lme)$BIC
# 399.5419

k <- attr(logLik(mon_lme), "df")
aic <- 2 * k -2 * logLik(mon_lme) 
aic

-2 * logLik(mon_lme) # the same as SAS
#'log Lik.' 329.6698 (df=18)

What is the SAS AIC calculation method ?

Regards

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Vincent Guyader
  • 2,927
  • 1
  • 26
  • 43
  • 3
    Do you have the SAS code to reproduce your output? Guess is that SAS is giving you the AIC of only the GROUP effect and not the full model. You can back out the `k` value from those numbers. Provide the code and maybe we can find the option to produce the number you want. – DomPazz Jan 08 '18 at 15:13

2 Answers2

8

You can find the calculation of the AIC according to SAS in the help pages, eg here :

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect008.htm#statug.mixed.mixedic

AIC is here calculated as -2LL + 2d

with LL being the maximum value of the log likelihood, and d the dimension of the model. In the case of restricted likelihood estimation, d represents the effective number of estimated covariance parameters. In this case that is 2 parameters as shown in your output.

On the other hand, R uses the degrees of freedom as calculated by Pinheiro and Bates. And they have a vastly different interpretation of degrees of freedom in the context of a mixed model as the one used by SAS. You can see that by using the function logLik :

> logLik(mon_lme)
'log Lik.' -164.8349 (df=18)

So in R, the value of d is 18. But R also uses k=2 for the standard calculation of AIC.

Joris Meys
  • 106,551
  • 31
  • 221
  • 263
  • For anyone who visits this in the future, note that you **cannot use AIC or log-likelihood to compare models differing in the fixed-effect component and fitted with REML**. Therefore, the results of any legitimate model comparison *within a platform* (i.e. between models with different variance components both fitted in SAS, or both fitted in R) will be the same (i.e. the ΔAIC will be the same between platforms because the AICs will differ by a model-independent amount) – Ben Bolker Jul 25 '23 at 20:10
0

I tried to find out via trial and error and I think that SAS uses the AIC formula with k = 2. That gives: 2*2 - 2* (-164.8349) = 333.6698 which is close to the value in the table. However, that's not what the value for k should be and looks like a bug to me.

Vandenman
  • 3,046
  • 20
  • 33
  • 1
    This would be in accordance with @DomPazz comment saying that SAS may use GROUP effect only. There are 2 groups: `groupe` & `temps`. – Sébastien Rochette Jan 08 '18 at 15:19
  • My guess is you can control what goes into that AIC calculation in SAS. You just need the option and I'll need to know what procedure he is using to find it. – DomPazz Jan 08 '18 at 15:25
  • R also uses k=2. The difference lies in the definition of degrees of freedom / dimension of the model. – Joris Meys Jan 08 '18 at 15:38