R - interaction with only one factor level in regression

Question

In a regression model is it possible to include an interaction with only one dummy variable of a factor? For example, suppose I have:

x: numerical vector of 3 variables (1,2 and 3)
y: response variable
z: numerical vector

Is it possible to build a model like:

y ~ factor(x) + factor(x) : z

but only include the interaction with one level of X? I realize that I could create a separate dummy variable for each level of x, but I would like to simplify things if possible.

Really appreciate any input!!

Why would you want to do this? This seems nonsensical at first blush. — gung - Reinstate Monica, Feb 18 '13 at 03:00
Perhaps it is nonsensical. I am still in the learning phase, but I couldn't seem to find any answers to my immediate issue anywhere. To be more clear, I have a cox proportional hazards model where I suspect only one of my categorical variables interacts with time. If I include that as a dummy variable it complicates the survfit function as the "newdata" must include the dummy variable. — user2081788, Feb 18 '13 at 03:08
Why not reshape the data a bit and create a new data.frame only including the interactions needed? — Ricardo Saporta, Feb 18 '13 at 03:14
Its more a question of simplicity. My understanding is that R regresses each category in a factor against the response variable as a dummy variable - hence coefficients are estimated for each category. There must be some simple way to tell R to only regress one of the dummy variables within a factor against the response ? — user2081788, Feb 18 '13 at 03:20
It may well be that only 1 group (out of 3) changes over time, but little is lost by including all 3 factors in the interaction. One of your factors will be held out as a reference group against which the others will be compared. You use 2 degrees of freedom to estimate the interaction, if you do it the way you want, you'll use 1. IE, you save only 1 df; even if this made sense, it can hardly be worth the trouble. — gung - Reinstate Monica, Feb 18 '13 at 03:23
Right! Definitely. Unfortunately my case is unique right now in that when all three interactions are tested I run into this problem outlined by someone here: https://stat.ethz.ch/pipermail/r-help/2008-September/174201.html - the model won't converge. However, if I do things in the long way and only include 1 interaction - I find it will converge just fine. Now I want to clean up my syntax and make it easier for further investigations — user2081788, Feb 18 '13 at 03:27

score 6 · Accepted Answer · answered Feb 18 '13 at 03:41

One key point you're missing is that when you see a significant effect for something like x2:z, that doesn't mean that x interacts with z when x == 2, it means that the difference between x == 2 and x == 1 (or whatever your reference level is) interacts with z. It's not a level of x that is interacting with z, it's one of the contrasts that has been set for x.

So for a 3 level factor with default treatment contrasts:

df <- data.frame(x = sample(1:3, 10, TRUE), y = rnorm(10), z = rnorm(10))
df$x <- factor(df$x)
contrasts(df$x)
  2 3
1 0 0
2 1 0
3 0 1

if you really think that only the first contrast is important, you can create a new variable that compares x == 2 to x == 1, and ignores x == 3:

df$x_1vs2 <- NA
df$x_1vs2[df$x == 1] <- 0
df$x_1vs2[df$x == 2] <- 1
df$x_1vs2[df$x == 3] <- NA

And then run your regression using that:

lm(y ~ x_1vs2 + x_1vs2:z)

Thanks! That explains a lot! – user2081788 Feb 18 '13 at 03:52 — user2081788, Feb 18 '13 at 03:52

Gary Weissman · Answer 2 · 2013-02-18T03:13:35.877

1

If x is already coded as a factor in your data, something like

y ~ x + I(x=='some_level'):z

Or if x is of numeric type in your data frame, then

y ~ as.factor(x) + I(as.factor(x)=='some_level'):z

Or to only model some subset of the data try:

lm(y ~ as.factor(x) + as.factor(x):z, data = subset(df, x=='some_level'))

edited Feb 18 '13 at 03:13

answered Feb 18 '13 at 03:03

Gary Weissman

3,557
1
18
23

1

This solution seems to have the same issue as Matthew Lundberg's – user2081788 Feb 18 '13 at 03:06
If you only want to analyze the subset of data that includes x=='some_level', then subset your data... added to my answer above – Gary Weissman Feb 18 '13 at 03:11
Thanks! I appreciate the help. Unfortunately I am not trying to model just a subset. See comments under main question for more details – user2081788 Feb 18 '13 at 03:21

score 0 · Answer 3 · answered Feb 18 '13 at 02:49

0

X <- data.frame(x = sample(1:3, 10, TRUE), y = rnorm(10), z = rnorm(10))
lm(y ~ factor(x) + factor(x):z, data=X)

Is it what you want?

answered Feb 18 '13 at 02:49

wush978

3,114
2
19
23

In this case each level in factor(x) will interact with z if my understanding is correct. I'm looking for a way to just have an interaction of the first level of factor(x) with z. (i.e. end up with only 1 interaction coefficient) – user2081788 Feb 18 '13 at 02:52

score 0 · Answer 4 · answered Feb 18 '13 at 02:54

0

Something like this may be what you need:

y~factor(x)+factor(x=='SomeLevel'):z

answered Feb 18 '13 at 02:54

Matthew Lundberg

42,009
6
90
112

1

This is definitely the concept I am looking for. However, this breaks down x into TRUE and FALSE and tests both as an interaction with z the result is a model with x=somelevel:z x!=somelevel:z I'm trying to include just x=somelevel:z – user2081788 Feb 18 '13 at 03:03
What to do with the rest? Perhaps you need to partition the data, and only model where x==somelevel? – Matthew Lundberg Feb 18 '13 at 03:10
Results in `fixed-effect model matrix is rank deficient so dropping 1 column / coefficient` – theforestecologist May 07 '18 at 03:10

R - interaction with only one factor level in regression

4 Answers4