1

I am trying to write an algorithm which does the following in R:

  1. On a data set dat use step function to perform glm model selection of j covariates from a set of J candidate variables
  2. Take final call of j variates and compare with full vector J. Write outcome into a 1xJ vector, where 1 indicates variable is in final call and 0 elsewise.

Example:

In the following example three variables (x,y,z) are candidates for prediction of variable dep. Step is used for variable selection. My goal is to finally have a vector indicating which of the input variables ends up in the final model, so here, c(1,0,1).

n=1000
x <- rnorm(n,0,1)
y <- rnorm(n,0,1)
z <- rnorm(n,0,1)

dep <- 1 + 2 * x + 3* z + rnorm(n,0,1)

m<-step(lm(dep~x+y+z),direction="backward")

I have difficulties extracting the variable names from the final m$call and creating the vector.

tomka
  • 2,516
  • 7
  • 31
  • 45

1 Answers1

1

I think this does it:

n=1000

x <- rnorm(n,0,1)
y <- rnorm(n,0,1)
z <- rnorm(n,0,1)

dep <- 1 + 2*x + 3*z + rnorm(n,0,1)

m<-step(lm(dep~x+y+z),direction="backward")

matt <- attributes(m$terms)
matt$term.labels
#[1] "x" "z"

v <- c("x","y","z")
as.integer(v %in% matt$term.labels)
#[1] 1 0 1
Mike Wise
  • 22,131
  • 8
  • 81
  • 104
  • 1
    you need to learn to dumpster dive with `str`. Have a look at Hadley Wickhams book "Advanced R" which is available online. – Mike Wise Jul 04 '15 at 20:35