0

I have a data set read as follows

test<-read.csv("data.csv",sep=",",header=T)

There are 10 predictor variables. The first column is response variables

x<-test[,-c(1)]
y<-test[,1]

If I would like to test a model with the first three predictor variables including their interaction terms, here is what I did with lm

test.model<-lm(y~x[,1]*x[,2]*x[,3], data=test)

But it turns out that the the resulting model also includes the interaction term of x[, 1]:x[, 2]:x[, 3] How can I limit the model with just two factor interactions, such as x[, 1]:x[, 2], x[, 2]:x[, 3] and x[, 1]:x[, 3]

If I would like to consider all 10 predictor variables, instead of writing x[,1]*x[,2]*x[,3]*x[,4]*...x[,10], are there convienent ways to write this formula?

user785099
  • 5,323
  • 10
  • 44
  • 62

2 Answers2

0

You can specify the highest order of interactions with ^.

y ~ (x[,1] + x[,2] + x[,3]) ^ 2

results in all two-variable interactions and main effects.

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
0

Two points. It makes no sense to extract the predictor and response as separate items if you are also going to supply a data argument. At worst it will start to fail at strange moments, but at a minimum it will confuse your collaborators. It's going to be much easy to interpret if you have meaningful column names.

As Sven points out you can use the "^" formula operator which means something quite different than exponentiation. I'm pretty sure this is a duplicate SO question so will now do a bit of searching.

IRTFM
  • 258,963
  • 21
  • 364
  • 487