1

I have a problem with step forward regression and My understanding is that i don't pass argument Data correctly.

I have the function:

ForwardStep <- function(df,yName, Xs, XsMin) {
    Data <- df[, c(yName,Xs)]
    fit <- glm(formula = paste(yName, " ~ ", paste0(XsMin, collapse = " + ")),
               data = Data, family = binomial(link = "logit") )
    ScopeFormula <- list(lower = paste(yName, " ~ ", paste0(XsMin, collapse = " + ")), 
                         upper = paste(yName, " ~ ", paste0(Xs, collapse = " + ")))
    result <- step(fit, direction = "forward", scope = ScopeFormula, trace = 1 )

    return(result)
}

When I try to run it with following arguments

df <- data.frame(Y= rep(c(0,1),25),time = rpois(50,2), x1 = rnorm(50, 0,1),
                 x2 = rnorm(50,.5,2), x3 = rnorm(50,0,1))
yName = "Y"
Xs <- c("x1","x2","x3")
XsMin <- 1

res <- ForwardStep(df,Yname,Xs,XsMin)

I am getting an Error: Error in is.data.frame(data) : object 'Data' not found

But if I first define Data in Global Env it works perfectly fine.

Data <- df[, c(yName,Xs)]

res <- ForwardStep(df,Yname,Xs,XsMin)

I guess that I have wrong implementation of function step however I don't exactly know how to do it the right way.

haphap32
  • 13
  • 3

1 Answers1

1

You need to realize that formulas always have an associated environment, see help("formula"). One should never pass text to the formula parameter of model functions, never ever. If you do that, you will encounter scoping issues sooner or later. Usually, I'd recommend computing on the language instead, but you can also create the formulas from text in the correct scope:

ForwardStep <- function(df,Yname, Xs, XsMin) {
  Data <- df[, c(Yname,Xs)]
  f1 <- as.formula(paste(Yname, " ~ ", paste0(XsMin, collapse = " + ")))

  fit <- glm(formula = f1,
             data = Data, family = binomial(link = "logit") )
  f2 <- as.formula(paste(Yname, " ~ ", paste0(XsMin, collapse = " + ")))
  f3 <- as.formula(paste(Yname, " ~ ", paste0(Xs, collapse = " + ")))

  ScopeFormula <- list(lower = f2, 
                       upper = f3)
   step(fit, direction = "forward", scope = ScopeFormula, trace = 1)
}

df <- data.frame(Y= rep(c(0,1),25),time = rpois(50,2), x1 = rnorm(50, 0,1),
                 x2 = rnorm(50,.5,2), x3 = rnorm(50,0,1))
YName = "Y"
Xs <- c("x1","x2","x3")
XsMin <- 1

res <- ForwardStep(df,YName,Xs,XsMin)
#Start:  AIC=71.31
#Y ~ 1
#
#       Df Deviance    AIC
#<none>      69.315 71.315
#+ x1    1   68.661 72.661
#+ x3    1   68.797 72.797
#+ x2    1   69.277 73.277

(Public service announcement: step-wise regression is a garbage generator. There are better statistical techniques available.)

Roland
  • 127,288
  • 10
  • 191
  • 288
  • *Public service announcement: step-wise regression is a garbage generator. There are better statistical techniques available.* - would you mind to share ? ;) – dario Mar 09 '20 at 16:02
  • 1
    @dario The best technique depends on your use case. You can use regularized regression approaches (LASSO, elastic net) or machine learning approaches. – Roland Mar 09 '20 at 16:05
  • Thank you for your comment @Roland. I was thinking about lasso, but you seemed to have such a clear opinion, I had to ask ;)! – dario Mar 09 '20 at 16:11
  • @Roland, can you add (or link to) a computing-on-the-language example? – eipi10 Mar 09 '20 at 16:35
  • @eipi: This is just what I could find most quickly: https://stackoverflow.com/a/59987272/1412059 – Roland Mar 09 '20 at 17:07