1

I have a problem and after spending my weekend on it, I would like to ask for help. To explain the problem, I would like to directly jump into an example:

 df <- data.frame(x=rnorm(100), z=rnorm(100), y=rnorm(100), f=rep(1:5,length.out=100 ))
 mod <- lm(y ~ x, data=df[df$z>0,])

I want to recycle the data-argument of the model:

 dat <- mod$call[['data']]

This gives me:

  df[df$z > 0, ]

However, str(dat) will reveal that this is an object of type language. I want to use this expression, however, to access the dataframe that has been used in lm (including the sub-setting), to get the corresponding values of another variable, say f. Note that converting the language object into a character with as.character() will result in a character-vector, and some of the brackets will be lost.

I want to use this inside a function, and what I am looking for is something like this:

 foo <- function(fm, "var.name"){
      new <- paste(dat, "$", var.name, sep="")
      newvar <- eval(parse(text=new), envir=.GlobalEnv)
      ... do stuff with newvar ... 
 }

Without sub-setting, this procedure gives me the variable f if I specify var.name as f. With sub-setting, I run into problems with parse due to the fact that dat is now a character-vector with brackets.

As a side-note: the reason why I want to recycle the data-argument from the lm-function instead of just using the same expression with var.name is that I change the sub-setting quite often, and having it recognized from the lm-object makes my life much easier. It also removes a source of error.

I would be highly indebted if anyone could help me out here...

coffeinjunky
  • 11,254
  • 39
  • 57

1 Answers1

3

You can just eval this expression like this

foo <- function(model, varname) eval(model$call[["data"]])[,varname]
foo(mod, "f")
##  [1] 2 5 2 5 1 2 1 5 2 3 1 2 3 1 3 4 1 3 4 1 2 3 2 4 1 4 1 2 4 5
## [31] 2 4 2 3 4 2 2 3 4 1 3 1 2
dickoa
  • 18,217
  • 3
  • 36
  • 50
  • True, this is useful as well. I think I will go with Ben's suggestion though. I think eval makes a copy of the object (here: the dataframe) in the local environment, right? Since my real dataframe is huge, I want to avoid that for computational purposes. Thanks nevertheless! – coffeinjunky Jul 28 '13 at 16:55
  • 1
    Yes is true that `eval` copy the object in a temporary environment but you have to check the memory usage of your `deparse` solution too, I didn't it try myself. – dickoa Jul 28 '13 at 17:13
  • Yes, I agree. I am actually comparing two methods at the moment and surprisingly, the deparse-solution is slower than the eval-solution. But other things are different as well. I will look more into that at a later time, just mentioning this for other readers. In any case, thanks a lot Dickoa! You guys are really helpful. – coffeinjunky Jul 28 '13 at 17:25
  • 1
    FYI the correct environment to evaluate the expression is `environment(model$terms)` - then this function will still work if data was an object created inside a function – hadley Jul 28 '13 at 17:52