10

I know there is a shortcut in Rto run an lm()regression on all a dataframe like this :

reg<-lm(y~.,data=df)

With df having explanatory variables x1, x2, ... x5, so it is the same as writing

reg<-lm(y~x1+x2+x3+x4+x5,data=df)

But this doesn't include interactions terms like x1:x2, ... Is there a shortcut in Rto run a regression on all columns of a dataframe with the interactions ? I am looking for 2 shortcuts which will have the same effects as

reg<-lm(y~x1*x2,x1*x3,x1*x4,x1*x5,x2*x3,...)
reg<-lm(y~x1*x2*x3*x4*x5) # this one will have interactions between the 5 variables
LyzandeR
  • 37,047
  • 12
  • 77
  • 87
etienne
  • 3,648
  • 4
  • 23
  • 37

2 Answers2

18

The shortcut you are searching for is:

reg <- lm(y ~ (.)^2, data = df)

This will create a model with the main effects and the interactions between regressors.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
Gena
  • 181
  • 1
  • 2
  • 3
    Welcome to Stack Overflow! Thank you for this code snippet, which might provide some limited, immediate help. A [proper explanation would greatly improve its long-term value](//meta.stackexchange.com/q/114762/206345) by showing _why_ this is a good solution to the problem, and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you've made. – Blue Sep 14 '18 at 15:12
  • Remove the asterisks, they are basically useless there and potentially harmful. – Rui Barradas Sep 14 '18 at 17:42
11

For both you could use the ^ operator.

See the example:

In your first case you just need the pair-wise interactions (2-way interactions). So you could do:

#Example df
df <- data.frame(a=runif(1:100), b=runif(1:100), c=runif(1:100), d=runif(1:100))

> lm(a ~ (b+c+d)^2, data=df)

Call:
lm(formula = a ~ (b + c + d)^2, data = df)

Coefficients:
(Intercept)            b            c            d          b:c          b:d          c:d  
    0.53873      0.23531      0.07813     -0.14763     -0.43130      0.11084      0.13181  

As you can see the above produced the pair-wise interactions

Now in order to include all the interactions you can do:

> lm(a ~ (b+c+d)^5 , data=df)

Call:
lm(formula = a ~ (b + c + d)^5, data = df)

Coefficients:
(Intercept)            b            c            d          b:c          b:d          c:d        b:c:d  
    0.54059      0.23123      0.07455     -0.15150     -0.42340      0.11926      0.14017     -0.01803  

In this case you just need to use a number greater than the number of variables you will use (in this case I use 5 but it could be anything greater than 3). As you see all the interactions are produced.

LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • 2
    Have a look at [this website](http://ww2.coastal.edu/kingw/statistics/R-tutorials/formulae.html). Here are all the interactions mentioned. Including the ones from @LyzandeR. – phiver Oct 22 '15 at 12:51
  • Why exponent of 5? I was thinking 3. – abalter May 20 '20 at 20:33