why do statsmodels and R disagree on AIC computation

Question

I have googled this and could not find a solution.

It seems R has an issue with AIC/BIC calculation. It produces incorrect results. A simple example is shown below:

link = 'https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv'
df = read.csv(link, row.names = 'model')
form = 'mpg ~ disp + hp + wt + qsec + gear'
my_model = lm(form, data = df)
summary(my_model)
cat('AIC:',AIC(my_model),'\tBIC:',AIC(my_model, k = log(nrow(df))))

AIC: 157.4512   BIC: 167.7113

Doing Exactly the same thing in python, I obtain:

import pandas as pd
from statsmodels.formula.api import ols as lm

link = 'https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv'
df = pd.read_csv(link, index_col='model')
form = 'mpg ~ disp + hp + wt + qsec + gear'
my_model = lm(form, df).fit()
my_model.summary()
print(f'AIC: {my_model.aic:.4f}\tBIC: {my_model.bic:.4f}')

AIC: 155.4512   BIC: 164.2456

You could check the summary(my_model) in R and my_model.summary() in python and you will notice that the two models are EXACTLY the same in everything, apart from the AIC and BIC.

I decided to compute it manually in R:

p = length(coef(my_model)) # number of predictors INCLUDING the Intercept ie 6
s = sqrt(sum(resid(my_model)^2)/nrow(df)) #sqrt(sigma(my_model)^2 * (nrow(df) - p)/nrow(df))
logl = -2* sum(dnorm(df$mpg, fitted(my_model),s, log = TRUE))

c(aic = logl + 2*p, bic = logl + log(nrow(df))*p)
 aic      bic 
155.4512 164.2456

Which matches the results produced by python.

Digging deeper, I noticed that the AIC does use the logLik function. And that is where the problem arises: logLik(my_model) gives exactly the same results as shown in the logl above before multiplying by -2 but the df is given as 7 instead of 6.

If I bruteforce the rank in order to make it 6, I get the correct results ie:

my_model$rank = my_model$rank - 1
cat('AIC:',AIC(my_model),'\tBIC:',AIC(my_model, k = log(nrow(df))))

AIC: 155.4512   BIC: 164.2456

Why does R add 1 to the number of predictors? You can access the logLik function used in base R by typing stats:::logLik.lm on your Rstudio and pressing enter. The two lines below kind of seems to have an issue:

function (object, REML = FALSE, ...) 
{
    ...
    p <- object$rank
    ...
    attr(val, "df") <- p + 1 # This line here. Why does R ADD 1?
    ...
}

Possibly related: https://stackoverflow.com/questions/37917437/loglik-lm-why-does-r-use-p-1-instead-of-p-for-degree-of-freedom — MrFlick, Oct 26 '20 at 03:48
@MrFlick thanks so much. Although why would R add one claiming to be an estimation of the standard error yet python doew not? Don't you think the authors of the statmodels should hqve consideres this and incorporated it? — Onyambu, Oct 26 '20 at 12:30
I can't speak for the authors of statsmodels for can I speak for the authors of R. All statistical results are based on assumptions that reasonable people might disagree about. You can write your own AIC function to define it however you like. — MrFlick, Oct 26 '20 at 17:03
I have to say that I would *prefer* a more neutrally phrased title. While R (like all software) does contain mistakes and bugs, it's very unlikely that a frequently used calculation like this would be flat-out wrong: "unexpected" or "why do statsmodels and R disagree?" would be better IMO ... — Ben Bolker, Apr 17 '21 at 17:07
I agree with Ben. Neither definition is an incorrect AIC. There are arguments for both, and I don't see a clear winner from the theoretical side. Statsmodels will not change the current definition but the plan is to add options for the user. But more importantly add variations of AIC that are appropriate under penalization or misspecification, e.g. GAIC, TIC for the latter. — Josef, Apr 17 '21 at 17:24
Also, the penalization term in AIC, BIC is linear in the number of parameters. So, it adds just a constant that does not affect the ordering of models for selection by AIC or BIC, as long as those models treat (non-)inclusion of scale parameter consistently. — Josef, Apr 17 '21 at 17:26

Ben Bolker · Accepted Answer · 2023-04-03T16:50:10.570

This is clearly a deliberate choice: R counts the scale parameter in the set of estimated parameters. From ?logLik.lm:

For ‘"lm"’ fits it is assumed that the scale has been estimated (by maximum likelihood or REML)

(see also here, pointed out by @MrFlick in the comments). This kind of ambiguity (and, whether normalization constants are included in the log-likelihoods: in R, they are) always has to be checked before comparing results across platforms, and sometimes even across procedures or functions within the same platform.

As discussed in the comments, the one-unit difference in AIC is confusing when comparing across platforms, but has no effect on any conclusions within a platform, because all conclusions (selected model(s), model weights) are based on differences in AIC values, never on the absolute (total) value of AIC. Indeed, it's often useful to start by converting AIC to ΔAIC by subtracting the minimum AIC in a set of models.

For what it's worth there also seems to be lots of discussion of this from the statsmodels side, e.g. this (closed) issue about why AIC/BIC are inconsistent between R and statsmodels ...

This commit in March 2002 shows Martin Maechler changing the "df" (degrees of freedom/number of model parameters) attribute back to object$rank+1 with the following additional annotations:

The help page ?logLik.lm gains:

Note that error variance \eqn{\sigma^2} is estimated in \code{lm()} and hence counted as well.

(this message was obviously edited at some later point to the version seen above).

The NEWS file gains (under "BUG FIXES"):

 o    logLik.lm() now uses  "df = p + 1" again (`+ sigma'!).

It was hard for me to do the archaeology back further than this (i.e. presumably based on the messages here the p+1 reckoning was originally used, then someone changed it to p instead, and MM changed it back in 2002), because functions moved around (this file was created in 2001, so earlier versions will be harder to find). I didn't find any discussion of this in the r-devel mailing list archive for Feb or Mar 2002 ...

why do statsmodels and R disagree on AIC computation

1 Answers1

Linked