Why doesn't R look up for the specified object in the provided envrionment parents tree?

Question

Configuration:

OS : Windows 10 (64 bits)
R version: 3.6.3

I'm learning R and currently I'm reading about the environments in R. I was doing some practice and I came up with an example that I created myself, yet it seems that I'm still unable to explain and understand the concept of looking up objects in R properly. Generally speaking, what I've understood so far (please correct me if I'm wrong), is that if R doesn't find an object in the current environment it calls in order all existing parent environments. Just to see how it works in practice, I created the following program:

library(rlang)
library(envnames)
library(lobstr)
e1 <- env()
e2 <- new_environment(parent = e1)
e3 <- new_environment(parent = e2)
e4 <- new_environment(parent = e3)
e5 <- new_environment(parent = e4)
e6 <- new_environment(parent = e5)
e7 <- new_environment(parent = e6)
e8 <- new_environment(parent = e7)
e9<- new_environment(parent = e8)
e10 <- new_environment(parent = e9)
e4$testvar <- 1200
e10$testfun <- function(x) {
    print(envnames::environment_name(caller_env()))
    return (testvar)
}

And here is how I run the above program by selecting e10 as the caller environment

with(data = e10, expr = e10$testfun())

Given that testvar is defined in the environment e4 and e4 is an ancestor of e10, I expected that R goes up in the parents tree from e10 up to e4 in order to find the value of testvar. But the programs stops with the following error:

Error in e10$testfun() (from #3) : object 'testvar' not found

Could you tell me what I've misunderstood? The fact that I use with(data = e10, ...) shouldn't imply that the environment used for the function call would be e10?

duckmayr · Accepted Answer · 2020-05-03T19:03:32.870

So, this is an unusually nuanced issue. There are two relevant types of environments that you need to think about here, the binding environment, or the environment that has a binding to your function, and the enclosing environment, or the environment where your function was created. In this case the binding environment is e10, but the enclosing environment is the global environment. From Hadley Wickham's Advanced R:

The enclosing environment belongs to the function, and never changes, even if the function is moved to a different environment. The enclosing environment determines how the function finds values; the binding environments determine how we find the function.

Consider the following (executed after executing your supplied code) that demonstrates this:

eval(expression(testfun()), envir = e10)
# [1] "e10"
# Error in testfun() : object 'testvar' not found
testvar <- 600
eval(expression(testfun()), envir = e10)
# [1] "e10"
# [1] 600

Moreover, now consider:

eval(envir = e10, expr = expression(
    testfun2 <- function(x) {
        print(envnames::environment_name(caller_env()))
        return (testvar)
    }
))
eval(expression(testfun2()), envir = e10)
# [1] "e10"
# [1] 1200

I hope this clarifies the issue.

Update: Determining the Enclosing and Binding Environments

So how can we determine the binding and enclosing environments for functions such as testfun()?

As G. Grothendieck's answer shows, the environment() function gives you the enclosing environment for a function:

environment(e10$testfun)
# <environment: R_GlobalEnv>

To my knowledge, there isn't a simple function in base R to give you a function's binding environments. If the function you're looking for is in a parent environment, you can use pryr::where():

pryr::where("mean")
# <environment: base>

(There is a base function to see if a function is in an environment, exists(), and pryr::where() uses it. But, it doesn't recurse through parent environments like where().)

However, if you're having to search through child environments, to my knowledge there isn't a function for that. But, seems pretty simple to mock one up:

get_binding_environments <- function(fname) {
    ## First we need to find all the child environments to search through.
    ## We don't want to start from the execution environment;
    ## we probably want to start from the calling environment,
    ## but you may want to change this to the global environment.
    e <- parent.frame()
    ## We want to get all of the environments we've created
    objects <- mget(ls(envir = e), envir = e)
    environments <- objects[sapply(objects, is.environment)]
    ## Then we use exists() to see if the function has a binding in any of them
    contains_f <- sapply(environments, function(E) exists(fname, where = E))
    return(unique(environments[contains_f]))
}

get_binding_environments("testfun")
# [[1]]
# <environment: 0x55f865406518>

e10
# <environment: 0x55f865406518>

Thank you very much for your help. I'm reading the 2nd edition of Hadley's book. Here is the link: http://adv-r.had.co.nz/Environments.html. In your response, you provided the link to the 1st edition of the book and surprisingly I found it to be much more complete and clearer on this topic. Indeed it is a quite complex subject (at least for a beginner like me) with many subtletities. The more I read the documentation, the more I need to read again and again to grasp all details. environment() returns the unique enclosing environment. Is there any function providing the binding environments? — user17911, May 03 '20 at 17:52
@user17911 Good question! The answer is: "Sort of." See the edits for more details. — duckmayr, May 03 '20 at 19:03

G. Grothendieck · Answer 2 · 2020-05-06T13:00:43.917

The code in the question defines a function in the global environment. We can query its environment like this:

environment(e10$testfun)
## <environment: R_GlobalEnv>

When a function looks up a free variable (one that is referenced but not defined in the function) such as testvar it uses the function's environment (and ancestors) to find the variable. testvar is in e4 but e4 is not an ancestor of the global environment so in the question testvar is not found.

Other environments are irrelevant. If the function is called then the environment of the caller (also known as the parent frame) plays no part at all in the variable lookup. Similarly, if the function is later placed somewhere else (in this case e10) that environment also plays no part in variable lookup. Furthermore, realize that when the function in the question is placed into e10 it has already been defined in the global environment and so the global environment has already been set as its environment. A function consists of arguments, body and environment (and attributes such as class) and moving the function to be somewhere else does not change any of those constituents.

Fixing

We would need to explicitly set the environment of testfun to e10 if we wanted it to have that as its environment and so to have e4 as an ancestor:

environment(e10$testfun) <- e10

or, alternately, we could not define testfun in the global environment in the first place but rather define testfun in the e10 environment right from the start:

with(e10, {
  testfun <- function() testvar
})

e10$testfun()
## [1] 1200

Function names

Another point of confusion may be the misconception that the following statement is defining the function named testfun in e10.

e10$testfun <- function() testvar

The problem with that idea is that functions do not have names. The three constituents of a function are arguments, body and environment (and attributes such as class and possibly srcref and scrfile). The name is not a constituent of a function. One can place a function in a variable and refer to that variable as if it were the name of the function but in reality it is just a variable that holds the function and the name is not part of the function itself. Thus in the above line of code we are not defining the function named testfun; rather, we are defining an anonymous function (in the global environment) and then moving it into the variable testfun.

An Example from R Itself

While it is common for user created functions to remain in the environment in which they are defined, e.g.

f <- function() "hello"

# the environment of f
environment(f)
## <environment: R_GlobalEnv>

# the environment where f is located (same)
as.environment(find("f"))
## <environment: R_GlobalEnv>

functions in R packages on the search path

# show search path
search()

normally are not located in their environment. For any function in a package on the search path the function will have the namespace of the package as its environment but when you access the function it will be found, not in the namespace but in a different environment.

# the environment of function mean
e1 <- environment(mean); e1
## <environment: namespace:base>

# where mean is located
e2 <- as.environment(find("mean")); e2
## <environment: base>

# these are NOT the same
identical(e1, e2)
## [1] FALSE

This is well illustrated in diagrams in this blog post: http://blog.obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/

proto

There is a package that works in the way the question seems to expect. proto objects are like environments but if you assign a function to them then the environment of the function is changed to be the proto object/environment that they are assigned to. (There are other differences too but we focus on this one.)

First we define a proto object p whose parent is e9 and then assign the function of interest to p. Finally we run that function. (The first argument is implicitly the proto object so we omit it.) We see that the function has indeed had its environment reset and that e4 is now an ancestor of its environment without explicitly setting it.

library(proto)

p <- proto(e9)  # define proto object whose parent is e9
p$testfun <- function(self) testvar

identical(p, with(p, environment(testfun)))  # testfun's environment is now p
## [1] TRUE

p$testfun()
## [1] 1200

Thank you very much for your time and for such a detailed and interesting description. I tried what you suggested, that is, environment(e10$testfun) <- e10 and indeed it works. Yet I'm a bit confused about why it works, because as far as I know (please correct me if I'm wrong), environment() returns the enclosing environment of a function which remains unique and the same once the function has been defined and cannot be changed later. I mean this is what has been written in Hadley's book: http://adv-r.had.co.nz/Environments.html#function-envs So how is possible to change it from global to e10? — user17911, May 03 '20 at 18:00
If `f` is a function then `environment(f)` is one of the three constituents of it and does not change *unless* we explicitly change it using `environment(f) <- whatever` Normally one does not muck around with environments and if you don't then, of course, it would not change. — G. Grothendieck, May 03 '20 at 21:56