4

I'm learning more about the lme4 package and have appreciated both Bodo Winter's tutorial and this guide on Tufts. However, the two guides differ when suggesting a method to determine the significance of a fixed effect.

Winters suggests using R's anova function to compare one model with the fixed effect in question and one without.

In contrast, Tufts first suggests using the car package's Anova function (they also suggest the anova method).

However, as can be seen in the play example below, the two methods return different chi-squared and p values.

library(lme4)
# meaningless models
lmer_wt_null = lmer(mpg ~ (1 + wt | cyl), data = mtcars, REML = FALSE)
lmer_wt_full = lmer(mpg ~ wt + (1 + wt | cyl), data = mtcars, REML = FALSE)

# stats::anova output (Winters)
anova(lmer_wt_null, lmer_wt_full)

# Data: mtcars
# Models:
#   lmer_wt_null: mpg ~ (1 + wt | cyl)
# lmer_wt_full: mpg ~ wt + (1 + wt | cyl)
# Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)  
# lmer_wt_null  5 167.29 174.62 -78.647   157.29                           
# lmer_wt_full  6 163.14 171.93 -75.568   151.14 6.1563      1    0.01309 *
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

library(car)
# car::anova output (Tufts)
Anova(lmer_wt_full)

# Analysis of Deviance Table (Type II Wald chisquare tests)
# 
# Response: mpg
# Chisq Df Pr(>Chisq)
# wt 19.213  1  1.169e-05 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

What are the two methods doing differently and what is the meaning of the difference between these p values?

I'm almost certain that I'm missing something basic. Thanks.

1 Answers1

7

I'm going to vote to migrate this to CrossValidated, but: this is a very nice example of the difference between Wald and likelihood ratio test p-values.

The results from anova() are based on a likelihood ratio test, and are equivalent (you can check) to computing the upper tail area of a chi-squared distribution:

pchisq(deviance(lmer_wt_null)-deviance(lmer_wt_full), df=1, lower.tail=FALSE)

The results from car::Anova() are based on Wald tests, which make a stronger assumption (they assume the log-likelihood surface is quadratic). The test here is a two-tailed test based on the upper tail of a Normal distribution1,2:

(cc <- coef(summary(lmer_wt_full)))
2*pnorm(abs(cc["wt","t value"]),lower.tail=FALSE)

We can get slightly more insight into this by computing and plotting the likelihood profile; deviations from a quadratic curve show where the Wald test fails.

pp <- profile(lmer_wt_full)
dd <- as.data.frame(pp)
est <- cc["wt","Estimate"]
se <- cc["wt","Std. Error"]
library(ggplot2)
ggplot(subset(dd,.par=="wt" & .zeta>-2.6 & .zeta<2.6),aes(x=.focal,y=.zeta))+
  geom_point()+geom_line()+
  geom_abline(intercept=-est*se,slope=se,colour="red")+
  geom_hline(yintercept=c(-1,1)*1.96)
ggsave("lmerprof.png")

enter image description here

The black line shows the likelihood profile. The y-axis shows the signed square root of 2*the deviance difference -- this is basically a Normal-deviate scale. On this scale, the Wald assumption of a quadratic log-likelihood surface corresponds to a linear profile. The red line shows the Wald approximation.

We can also compare the confidence intervals based on likelihood profiles to those based on the Wald approximation (the areas between the +/- 1.96 cutoffs on the graph):

Likelihood profile:

confint(pp)["wt",]
##    2.5 %    97.5 % 
##-7.042211 -1.561525 

Wald:

confint(lmer_wt_full,method="Wald")["wt",]
##    2.5 %    97.5 % 
##-6.018525 -2.299228 

1in many cases this is based on the t-distribution instead, but that gets us into harder issues about how to estimate the degrees of freedom

2if we wanted we could find an equivalent chi-squared test, but this is more typically done with Normal statistics

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453