1

Working with R, this is a real WTF:

R> f_string <- 'Sepal.Length ~ Sepal.Width'
R> l <- with(iris, lm(as.formula(f_string))) # works fine

R> f_formula <- as.formula(f_string)
R> l <- with(iris, lm(f_formula))
Error in eval(expr, envir, enclos) : object 'Sepal.Length' not found

Why does as.formula have to be inside the lm() call? I get it that this is a question about which environment things are evaluated in, because this works:

R> f_formula <- with(iris, as.formula(f_string))
R> lm(f_formula)

but I'm having real trouble wrapping my head around why one works and the other one doesn't.

naught101
  • 18,687
  • 19
  • 90
  • 138
  • 1
    You might want to clean up your code and make it fully reproducible. However, looking at the code of `as.formula` I also don't understand it. I'd thought that either `as.formula(f_string, env=basenev())` or `as.formula(f_string, env=parent.frame())` should work (I expected the former), but only if the `env` is `missing` it works. (I hope you know that you shouldn't use `with` here. `lm` and friends have a `data` argument for a reason.) – Roland Sep 04 '14 at 07:52
  • @Roland whoops, accidentally left some cruft in there.... – naught101 Sep 04 '14 at 08:03
  • @Roland: totally true about the `data=` arguement, too. It lets `lm()` take strings fine, so I can avoid the whole question. Still, it's interesting, as a compsci neophyte :) – naught101 Sep 04 '14 at 08:12

1 Answers1

2

Your failing example fails because you are creating the formula with the global environment:

> f_formula <- as.formula(f_string)
> l <- with(iris, lm(f_formula))
Error in eval(expr, envir, enclos) : object 'Sepal.Length' not found
> str(f_formula)
Class 'formula' length 3 Sepal.Length ~ Sepal.Width
  ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 

and there's no Sepal.Length there. If you create the appropriate objects in the global environment you can make it work:

> Sepal.Length=1:10
> Sepal.Width=runif(10)
> l <- with(iris, lm(f_formula)) # "works" (ie doesn't error)

But that is completely ignoring the iris data. Welcome to the world of annoying R behaviour.

The other examples are all computing the formula object within the iris data frame as an environment. If you debug lm and take a look at what formula is in one of your working cases:

Browse[2]> str(formula)
Class 'formula' length 3 Sepal.Length ~ Sepal.Width
  ..- attr(*, ".Environment")=<environment: 0x9d590b4> 

you'll see the environment is no longer the global one. If you want to see what's in that environment, get it from the formula's attributes and list:

Browse[2]> e = attr(formula,".Environment")
Browse[2]> with(e,ls())
[1] "Petal.Length" "Petal.Width"  "Sepal.Length" "Sepal.Width"  "Species"     
Spacedman
  • 92,590
  • 12
  • 140
  • 224
  • using `environment()` is a slightly prettier (in my opinion) way to get at the environment that an object is assigned to compared to `attr(x,".Environment")`. So you might do something like `ls(envir=environment(f_formula))`. I'd hardly call this behavior "annoying" when this is meant to be dealt with easily with the `data=` parameter of `lm()`. Don't get mad at functions when you use them incorrectly. As @Roland first said, use `lm(f_formula, iris)` rather than `with(iris, lm(f_formula))`. – MrFlick Sep 04 '14 at 14:03
  • 1
    I used that notation because that's how `str(f)` shows it. Most R objects (functions, data frames) don't store their environment in an attribute. I have no idea why formulae do. `print.formula` is interesting. The annoyance is that for a functional language, there's some evaluation things going on in formulae that pure functionalists would baulk at. – Spacedman Sep 04 '14 at 15:05