Why does changing contrast type change row labels in R lm summary?

Question

With the default contrasts in R (contr.treatment), the summary of a linear model object gives row names according to the level names. When I change the contrasts to contr.sum, the summary of the linear model object gives row names according to made up numbers.

For the example code below, the row names for treatment contrasts are xa xb xc xd xe, for sum contrasts they are x1 x2 x3 x4 x5.

Is there a way to make these behave the same way besides manually renaming the rows?

EXAMPLE:

y <- rnorm(10, 0, 1)
x <- factor(rep(letters[1:5], each = 2))

options(contrasts = c("contr.treatment", "contr.poly"))
summary(lm(y ~ x))

options(contrasts = c("contr.sum", "contr.poly"))
summary(lm(y ~ x))

But they aren't the same, are they? Using different contrasts means that the coefficients have different interpretations, so why would you label them the same? — joran, May 30 '12 at 01:37
@joran, would it change your interpretation if the rows were labeled with the group names? — Jdub, May 30 '12 at 03:11
Well, the interpretations are different, regardless of how they are labelled in R's output. If you simply want to label them differently, then I think you're stuck making the change after the fact. I was just concerned that you were interpreting sum contrasts the same way you interpret treatment contrasts when they aren't quite the same thing. — joran, May 30 '12 at 03:18
I spent a little while on this and traced it back as far as `model.matrix`. It only applies if the contrasts are set to something other than the default. I agree with @joran, I think you're stuck. (You can also set the contrasts explicitly and name their columns, *then* assign them to specific factors.) — Ben Bolker, May 30 '12 at 07:58
Look at `contr.Treatment` in the `car` package for how to make a contrast function with different names for the contrasts. — Brian Diggs, May 30 '12 at 21:59

Alexander Shenkin · Answer 1 · 2016-02-16T13:12:57.643

I like your solution @Aaron, and have implemented it, but I think it contains a dangerous error. The sum contrast gives you the the differences between the first n-1 factors and the grand mean, not the last n-1 factors, which is what your naming algorithm returns. See Crawley's R Book 2nd Edition page 442-443.

Thus, I believe the correct function should instead be:

contr.sum.keepnames <- function(...) {
    conS <- contr.sum(...)
    colnames(conS) = rownames(conS)[-length(rownames(conS))]
    conS
}

BTW, I tried adding this as a comment, but had difficulty adding a codeblock within the comment.

score 1 · Accepted Answer · answered Jun 01 '12 at 16:27

I'm still not at all sure this is a good idea, I think the possibility of getting confused about what the contrasts mean is too high. Still, what I would do is to make a new contrasts function that computes sum contrasts but sets the names equal the default names from the treatment contrasts.

set.seed(5)
n <- 5
y <- c(10 + rnorm(n, 0, 1), 20 + rnorm(n, 0, 1), 30 + rnorm(n, 0, 1))
wFactor <- as.factor(c(rep("A", n), rep("B", n), rep("C", n)))

contr.sumX <- function(...) {
  conT <- contr.treatment(...)
  conS <- contr.sum(...)
  colnames(conS) <- colnames(conT)
  conS
}

For reference, here's the usual output:

> m1 <- lm(y ~ wFactor, contrasts = list(wFactor=contr.sum(n = levels(wFactor))))
> coef(summary(m1))
              Estimate Std. Error     t value     Pr(>|t|)
(Intercept) 19.8218432  0.2481727  79.8711599 9.889455e-18
wFactor1    -9.6079241  0.3509692 -27.3754029 3.480430e-12
wFactor2    -0.1934654  0.3509692  -0.5512319 5.915907e-01

And here's the output with the contr.sumX function.

> m2 <- lm(y ~ wFactor, contrasts = list(wFactor=contr.sumX(n = levels(wFactor))))
> coef(summary(m2))
              Estimate Std. Error     t value     Pr(>|t|)
(Intercept) 19.8218432  0.2481727  79.8711599 9.889455e-18
wFactorB    -9.6079241  0.3509692 -27.3754029 3.480430e-12
wFactorC    -0.1934654  0.3509692  -0.5512319 5.915907e-01

Alternately, you can set the contrasts for a particular factor ahead of time:

contrasts(wFactor) <- "contr.sumX"
m3 <- lm(y ~ wFactor)
> coef(summary(m3))
              Estimate Std. Error     t value     Pr(>|t|)
(Intercept) 19.8218432  0.2481727  79.8711599 9.889455e-18
wFactorB    -9.6079241  0.3509692 -27.3754029 3.480430e-12
wFactorC    -0.1934654  0.3509692  -0.5512319 5.915907e-01

Why does changing contrast type change row labels in R lm summary?

2 Answers2