10

In a regression model is it possible to include an interaction with only one dummy variable of a factor? For example, suppose I have:

x: numerical vector of 3 variables (1,2 and 3)
y: response variable
z: numerical vector

Is it possible to build a model like:

y ~ factor(x) + factor(x) : z

but only include the interaction with one level of X? I realize that I could create a separate dummy variable for each level of x, but I would like to simplify things if possible.

Really appreciate any input!!

Arun
  • 116,683
  • 26
  • 284
  • 387
user2081788
  • 103
  • 1
  • 4
  • 1
    Why would you want to do this? This seems nonsensical at first blush. – gung - Reinstate Monica Feb 18 '13 at 03:00
  • 1
    Perhaps it is nonsensical. I am still in the learning phase, but I couldn't seem to find any answers to my immediate issue anywhere. To be more clear, I have a cox proportional hazards model where I suspect only one of my categorical variables interacts with time. If I include that as a dummy variable it complicates the survfit function as the "newdata" must include the dummy variable. – user2081788 Feb 18 '13 at 03:08
  • Why not reshape the data a bit and create a new data.frame only including the interactions needed? – Ricardo Saporta Feb 18 '13 at 03:14
  • Its more a question of simplicity. My understanding is that R regresses each category in a factor against the response variable as a dummy variable - hence coefficients are estimated for each category. There must be some simple way to tell R to only regress one of the dummy variables within a factor against the response ? – user2081788 Feb 18 '13 at 03:20
  • 2
    It may well be that only 1 group (out of 3) changes over time, but little is lost by including all 3 factors in the interaction. One of your factors will be held out as a reference group against which the others will be compared. You use 2 degrees of freedom to estimate the interaction, if you do it the way you want, you'll use 1. IE, you save only 1 df; even if this made sense, it can hardly be worth the trouble. – gung - Reinstate Monica Feb 18 '13 at 03:23
  • 1
    Right! Definitely. Unfortunately my case is unique right now in that when all three interactions are tested I run into this problem outlined by someone here: https://stat.ethz.ch/pipermail/r-help/2008-September/174201.html - the model won't converge. However, if I do things in the long way and only include 1 interaction - I find it will converge just fine. Now I want to clean up my syntax and make it easier for further investigations – user2081788 Feb 18 '13 at 03:27

4 Answers4

6

One key point you're missing is that when you see a significant effect for something like x2:z, that doesn't mean that x interacts with z when x == 2, it means that the difference between x == 2 and x == 1 (or whatever your reference level is) interacts with z. It's not a level of x that is interacting with z, it's one of the contrasts that has been set for x.

So for a 3 level factor with default treatment contrasts:

df <- data.frame(x = sample(1:3, 10, TRUE), y = rnorm(10), z = rnorm(10))
df$x <- factor(df$x)
contrasts(df$x)
  2 3
1 0 0
2 1 0
3 0 1

if you really think that only the first contrast is important, you can create a new variable that compares x == 2 to x == 1, and ignores x == 3:

df$x_1vs2 <- NA
df$x_1vs2[df$x == 1] <- 0
df$x_1vs2[df$x == 2] <- 1
df$x_1vs2[df$x == 3] <- NA

And then run your regression using that:

lm(y ~ x_1vs2 + x_1vs2:z)
Marius
  • 58,213
  • 16
  • 107
  • 105
1

If x is already coded as a factor in your data, something like

y ~ x + I(x=='some_level'):z

Or if x is of numeric type in your data frame, then

y ~ as.factor(x) + I(as.factor(x)=='some_level'):z

Or to only model some subset of the data try:

lm(y ~ as.factor(x) + as.factor(x):z, data = subset(df, x=='some_level'))
Gary Weissman
  • 3,557
  • 1
  • 18
  • 23
0
X <- data.frame(x = sample(1:3, 10, TRUE), y = rnorm(10), z = rnorm(10))
lm(y ~ factor(x) + factor(x):z, data=X)

Is it what you want?

wush978
  • 3,114
  • 2
  • 19
  • 23
  • In this case each level in factor(x) will interact with z if my understanding is correct. I'm looking for a way to just have an interaction of the first level of factor(x) with z. (i.e. end up with only 1 interaction coefficient) – user2081788 Feb 18 '13 at 02:52
0

Something like this may be what you need:

y~factor(x)+factor(x=='SomeLevel'):z
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • 1
    This is definitely the concept I am looking for. However, this breaks down x into TRUE and FALSE and tests both as an interaction with z the result is a model with x=somelevel:z x!=somelevel:z I'm trying to include just x=somelevel:z – user2081788 Feb 18 '13 at 03:03
  • What to do with the rest? Perhaps you need to partition the data, and only model where x==somelevel? – Matthew Lundberg Feb 18 '13 at 03:10
  • Results in `fixed-effect model matrix is rank deficient so dropping 1 column / coefficient` – theforestecologist May 07 '18 at 03:10