18

In R stepwise forward regression, I specify a minimal model and a set of variables to add (or not to add):

min.model = lm(y ~ 1)
fwd.model = step(min.model, direction='forward', scope=(~ x1 + x2 + x3 + ...))

Is there any way to specify using all variables in a matrix/data.frame, so I don't have to enumerate them?

Examples to illustrate what I'd like to do, but they don't work:

# 1
fwd.model = step(min.model, direction='forward', scope=(~ ., data=my.data.frame))

# 2
min.model = lm(y ~ 1, data=my.data.frame)
fwd.model = step(min.model, direction='forward', scope=(~ .))
Jaap
  • 81,064
  • 34
  • 182
  • 193
Michael Schubert
  • 2,726
  • 4
  • 27
  • 49

2 Answers2

24

scope expects (quoting the help page ?step)

either a single formula, or a list containing components ‘upper’ and ‘lower’, both formulae. See the details for how to specify the formulae and how they are used.

You can extract and use the formula corresponding to "~." like this:

> my.data.frame=data.frame(y=rnorm(20),foo=rnorm(20),bar=rnorm(20),baz=rnorm(20))
> min.model = lm(y ~ 1, data=my.data.frame)
> biggest <- formula(lm(y~.,my.data.frame))
> biggest
y ~ foo + bar + baz
> fwd.model = step(min.model, direction='forward', scope=biggest)
Start:  AIC=0.48
y ~ 1

       Df Sum of Sq    RSS      AIC
+ baz   1    2.5178 16.015 -0.44421
<none>              18.533  0.47614
+ foo   1    1.3187 17.214  0.99993
+ bar   1    0.4573 18.075  1.97644

Step:  AIC=-0.44
y ~ baz

       Df Sum of Sq    RSS      AIC
<none>              16.015 -0.44421
+ foo   1   0.41200 15.603  1.03454
+ bar   1   0.20599 15.809  1.29688
> 
Stephan Kolassa
  • 7,953
  • 2
  • 28
  • 48
  • 12
    Have you read about the vast amount of evidence that variable selection causes severe problems of estimation and inference? At the very least, the stepwise approach should be bootstrapped to show its arbitrariness. – Frank Harrell Apr 07 '14 at 12:24
  • 1
    @FrankHarrell - where can I learn more about bootstrapping stepwise regression? – EngrStudent Jun 27 '17 at 18:48
  • Http://biostat.mc.vanderbilt.edu/rms and look for course notes – Frank Harrell Jun 27 '17 at 20:24
2

You can do it in one step like this

fwd.model = step(lm(y ~ 1, data=my.data.frame), direction='forward', scope=~ x1 + x2 + x3 + ...)

shiny
  • 3,380
  • 9
  • 42
  • 79