9

The problem: I cannot remove a lower order parameter (e.g., a main effects parameter) in a model as long as the higher order parameters (i.e., interactions) remain in the model. Even when doing so, the model is refactored and the new model is not nested in the higher model.
See the following example (as I am coming from ANOVAs I use contr.sum):

d <- data.frame(A = rep(c("a1", "a2"), each = 50), B = c("b1", "b2"), value = rnorm(100))
options(contrasts=c('contr.sum','contr.poly'))
m1 <- lm(value ~ A * B, data = d)
m1

## Call:
## lm(formula = value ~ A * B, data = d)
## 
## Coefficients:
## (Intercept)           A1           B1        A1:B1  
##   -0.005645    -0.160379    -0.163848     0.035523  

m2 <- update(m1, .~. - A)
m2

## Call:
## lm(formula = value ~ B + A:B, data = d)

## Coefficients:
## (Intercept)           B1       Bb1:A1       Bb2:A1  
##   -0.005645    -0.163848    -0.124855    -0.195902  

As can be seen, although I remove one parameter (A), the new model (m2) is refactored and is not nested in the bigger model (m1). If I transform my factors per hand in numerical contrast variables I can get the desired results, but how do I get it using R's factor capabilities?

The Question: How can I remove a lower order factor in R and obtain a model that really misses this parameter and is not refactored (i.e., the number of parameters in the smaller model must be lower)?


But why? I want to obtain 'Type 3' like p-values for a lmer model using the KRmodcomp function from the pbkrtest package. So this example is really just an example.

Why not CrossValidated? I have the feeling that this is really more of an R then a stats question (i.e., I know that you should never fit a model with interactions but without one of the main effects, but I still want to do it).

Henrik
  • 14,202
  • 10
  • 68
  • 91
  • 1
    Read Bill Venables [http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf](http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf) on type III sums of squares. It is a stats question. – mnel Jul 05 '12 at 00:22
  • 3
    One way to do this is to construct the full design matrix (using `model.matrix`), delete the columns you don't want, and then fit the model to the remaining columns. I'll do an example if/when I get a chance ... – Ben Bolker Jul 05 '12 at 02:03
  • Have a look at the [`MixMod` package](http://cran.r-project.org/web/packages/MixMod/). Base `R` will not support this (see my earlier comment re Bill Venables. – mnel Jul 05 '12 at 05:28
  • 1
    I know that the topic of type III sums of squares is a sensitive one to the stats community. However, I did specifically not ask for any statistical advice. I have a technical question on how to do a certain thing in R. You should know that my field (Psychology) expects type 3, so I would like to deliver type 3 results. Furthermore, why the downvote? My question is reasonably formulated, contains a reproducible example, and is specific. As there is no explanation it seems to be based on a dislike of type 3 results. This is rather unreasonable. – Henrik Jul 05 '12 at 06:59
  • @Henrik I take you point (Type III expected in your field), but to me that is just about the single worst reason for doing something; just because the others do it. Did you read Venable's Exegeses as linked to by mnel? It has been a while since I read it but IIRC it went far beyond mere dislike of Type III SS. – Gavin Simpson Jul 05 '12 at 08:46
  • @GavinSimpson I fully agree, that it is a bad reason. But my current work only uses stats as an tool. It does not deal with stats per se. Hence, I want to focus on the relevant content without dealing too much with the stats part (using mixed models is already ambitious). Therefore, it would be great if I could use the mixed model in a way that is common. But, I will try to read the Exegeses this week. – Henrik Jul 05 '12 at 09:07
  • @DWin I think the up votes are because the question is well formulated. Statistical merits aside, Henrik gave example data, clearly demonstrated the problem, and laid out his goals. You can just about count on one hand the number of questions that come to R-help where the asker has put in that much effort to be clear. – Joshua Jul 08 '12 at 20:39
  • My new package [afex](http://cran.r-project.org/web/packages/afex/index.html), which uses the answers given here in function `mixed()`, was just accepted on CRAN. – Henrik Aug 10 '12 at 13:49

2 Answers2

8

Here's a sort of answer; there is no way that I know of to formulate this model directly by the formula ...

Construct data as above:

d <- data.frame(A = rep(c("a1", "a2"), each = 50),
                B = c("b1", "b2"), value = rnorm(100))
options(contrasts=c('contr.sum','contr.poly'))

Confirm original finding that just subtracting the factor from the formula doesn't work:

m1 <- lm(value ~ A * B, data = d)
coef(m1)
## (Intercept)          A1          B1       A1:B1 
## -0.23766309  0.04651298 -0.13019317 -0.06421580 

m2 <- update(m1, .~. - A)
coef(m2)
## (Intercept)          B1      Bb1:A1      Bb2:A1 
## -0.23766309 -0.13019317 -0.01770282  0.11072877 

Formulate the new model matrix:

X0 <- model.matrix(m1)
## drop Intercept column *and* A from model matrix
X1 <- X0[,!colnames(X0) %in% "A1"]

lm.fit allows direct specification of the model matrix:

m3 <- lm.fit(x=X1,y=d$value)
coef(m3)
## (Intercept)          B1       A1:B1 
## -0.2376631  -0.1301932  -0.0642158 

This method only works for a few special cases that allow the model matrix to be specified explicitly (e.g. lm.fit, glm.fit).

More generally:

## need to drop intercept column (or use -1 in the formula)
X1 <- X1[,!colnames(X1) %in% "(Intercept)"]
## : will confuse things -- substitute something inert
colnames(X1) <- gsub(":","_int_",colnames(X1))
newf <- reformulate(colnames(X1),response="value")
m4 <- lm(newf,data=data.frame(value=d$value,X1))
coef(m4)
## (Intercept)          B1   A1_int_B1 
##  -0.2376631  -0.1301932  -0.0642158 

This approach has the disadvantage that it won't recognize multiple input variables as stemming from the same predictor (i.e., multiple factor levels from a more-than-2-level factor).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thanks for the great answer. I will accept your answer (I think both are similar), as you show how to construct the formula and mention the problem of categorical predictors with more than two levels. – Henrik Jul 13 '12 at 14:13
6

I think the most straightforward solution is to use model.matrix. Possibly, you could achieve what you want with some fancy footwork and custom contrasts. However, if you want "type 3 esque" p-values, You probably want it for every term in your model, in which case, I think my approach with model.matrix is convenient anyway because you can easily implicitly loop through all models dropping one column at a time. The provision of a possible approach is not an endorsement of the statistical merits of it, but I do think you formulated a clear question and seem to know it may be unsound statistically so I see no reason not to answer it.

## initial data
set.seed(10)
d <- data.frame(
  A = rep(c("a1", "a2"), each = 50),
  B = c("b1", "b2"),
  value = rnorm(100))

options(contrasts=c('contr.sum','contr.poly'))

## create design matrix
X <- model.matrix(~ A * B, data = d)

## fit models dropping one effect at a time
## change from 1:ncol(X) to 2:ncol(X)
## to avoid a no intercept model
m <- lapply(1:ncol(X), function(i) {
  lm(value ~ 0 + X[, -i], data = d)
})
## fit (and store) the full model
m$full <- lm(value ~ 0 + X, data = d)
## fit the full model in usual way to compare
## full and regular should be equivalent
m$regular <- lm(value ~ A * B, data = d)
## extract and view coefficients
lapply(m, coef)

This results in this final output:

[[1]]
   X[, -i]A1    X[, -i]B1 X[, -i]A1:B1 
  -0.2047465   -0.1330705    0.1133502 

[[2]]
X[, -i](Intercept)          X[, -i]B1       X[, -i]A1:B1 
        -0.1365489         -0.1330705          0.1133502 

[[3]]
X[, -i](Intercept)          X[, -i]A1       X[, -i]A1:B1 
        -0.1365489         -0.2047465          0.1133502 

[[4]]
X[, -i](Intercept)          X[, -i]A1          X[, -i]B1 
        -0.1365489         -0.2047465         -0.1330705 

$full
X(Intercept)          XA1          XB1       XA1:B1 
  -0.1365489   -0.2047465   -0.1330705    0.1133502 

$regular
(Intercept)          A1          B1       A1:B1 
 -0.1365489  -0.2047465  -0.1330705   0.1133502 

That is nice so far for models using lm. You mentioned this is ultimately for lmer(), so here is an example using mixed models. I believe it may become more complex if you have more than a random intercept (i.e., effects need to be dropped from the fixed and random parts of the model).

## mixed example
require(lme4)

## data is a bit trickier
set.seed(10)
mixed <- data.frame(
  ID = factor(ID <- rep(seq_along(n <- sample(3:8, 60, TRUE)), n)),
  A = sample(c("a1", "a2"), length(ID), TRUE),
  B = sample(c("b1", "b2"), length(ID), TRUE),
  value = rnorm(length(ID), 3) + rep(rnorm(length(n)), n))

## model matrix as before
X <- model.matrix(~ A * B, data = mixed)

## as before but allowing a random intercept by ID
## becomes trickier if you need to add/drop random effects too
## and I do not show an example of this
mm <- lapply(1:ncol(X), function(i) {
  lmer(value ~ 0 + X[, -i] + (1 | ID), data = mixed)
})

## full model
mm$full <- lmer(value ~ 0 + X + (1 | ID), data = mixed)
## full model regular way
mm$regular <- lmer(value ~ A * B + (1 | ID), data = mixed)

## view all the fixed effects
lapply(mm, fixef)

Which gives us...

[[1]]
   X[, -i]A1    X[, -i]B1 X[, -i]A1:B1 
 0.009202554  0.028834041  0.054651770 

[[2]]
X[, -i](Intercept)          X[, -i]B1       X[, -i]A1:B1 
        2.83379928         0.03007969         0.05992235 

[[3]]
X[, -i](Intercept)          X[, -i]A1       X[, -i]A1:B1 
        2.83317191         0.02058800         0.05862495 

[[4]]
X[, -i](Intercept)          X[, -i]A1          X[, -i]B1 
        2.83680235         0.01738798         0.02482256 

$full
X(Intercept)          XA1          XB1       XA1:B1 
  2.83440919   0.01947658   0.02928676   0.06057778 

$regular
(Intercept)          A1          B1       A1:B1 
 2.83440919  0.01947658  0.02928676  0.06057778 
Joshua
  • 686
  • 3
  • 7
  • 1
    Thanks a lot for the great answer. I will award you with the 100 points (as you specifically showed how to use `lmer`) but will accept Ben Bolker's answer (see there for the rationale). – Henrik Jul 13 '12 at 14:09