1

I have a data set of 7 variables and I want to run all possible combinations. What I exactly want is to run is different equations which choose different variables, for instance:

Y = b_0 + b_1*X_1 + b_2*X_2
Y = b_0 + b_1*X_1 + b_2*X_3
Y = b_0 + b_1*X_1 + b_2*X_2 + b_3*X_3
Y = b_0 + b_1*X_1 + b_2*X_2 + b_3*X_3 + b_2*X_4
Y = b_0 + b_1*X_1 + b_2*X_2 + b_2*X_4 
Y = b_0 + b_1*X_1 + b_2*X_4

All possible combination in this order. How should I setup my loop function?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks! I am very much new to stack overflow. – Aru Bhardwaj Nov 22 '20 at 05:50
  • 1
    Search on all subsets regression. Fancy data dredging. – IRTFM Nov 22 '20 at 06:19
  • Not recommended. But if you really want to do so, check [this](https://stackoverflow.com/questions/18705153/generate-list-of-all-possible-combinations-of-elements-of-vector). – ekoam Nov 22 '20 at 06:49
  • You can check this out too but there is probably a better way of doing it: https://stackoverflow.com/a/58946505/10142537 – QAsena Nov 22 '20 at 08:09

1 Answers1

6

Generate example data:

dat <- data.frame(
  Y = rnorm(100),
  X_1 = rnorm(100),
  X_2 = rnorm(100),
  X_3 = rnorm(100),
  X_4 = rnorm(100),
  X_5 = rnorm(100),
  X_6 = rnorm(100),
  X_7 = rnorm(100)
)

Find all 1 through 7 combinations of variables and paste them into a formula with Y as dependent variable:

variables <- colnames(dat)[2:ncol(dat)]
formulas <- list()
for (i in seq_along(variables)) {
  tmp <- combn(variables, i)
  tmp <- apply(tmp, 2, paste, collapse="+")
  tmp <- paste0("Y~", tmp)
  formulas[[i]] <- tmp
}
formulas <- unlist(formulas)
formulas <- sapply(formulas, as.formula)

Estimate 127 regression models:

models <- lapply(formulas, lm, data=dat)
Vincent
  • 15,809
  • 7
  • 37
  • 39