Difference between applying a function() to list() and to new.env()?

Question

Why do I get two different results if I print x$val? I get that the first one is a list and the second is an environment, but I do not understand what makes the result of x$val from the second chunk = NA

x <- list()
x$val <- 1
myfun <- function(x) {x$val <- x$val + NA} 
myfun(x)
x$val
##[1] 1

x <- new.env()
x$val <- 18
myfun <- function(x) {x$val <- x$val + NA} 
myfun(x)
x$val
##[1] NA

Two things, `1 + NA` is just `NA` and same for `18 + NA`. So in the second function it makes sense to output `NA`, but in the first function it also generates `NA` but your value outside the list is not updated due to environment. If you use `x$val <<- x$val + NA` you will see it updates the value with `NA` as well. — Merijn van Tilborg, Feb 02 '22 at 11:59
Because you haven't assigned the return of `myfun()` to anything when passing it a list. R has no inplace assignment, thus `x` was not modified. The environment example "works" because you're doing variable assignment within an environment. The first is doing variable assignment within the function environment. — caldwellst, Feb 02 '22 at 11:59
@MerijnvanTilborg: The part of your comment about `x$val <<- x$val + NA` is correct, but a little misleading. It will always modify a variable named `x` regardless of what was passed as the argument. If the global list was named `y` and `myfun(y)` was called, `y` would remain unchanged. — user2554330, Feb 02 '22 at 12:23
I would never use `<<-` but in the scope of the expectation of the OP it is the way to do. I agree I could have been more detailed about the "why" and on "how" to do it properly without the risks as you mention. — Merijn van Tilborg, Feb 02 '22 at 12:27

score 1 · Answer 1 · answered Feb 02 '22 at 12:15

Environments are "reference objects" in R. That means that if you assign one to a new variable, changes to either copy will affect both copies. Lists are like most other objects and get copied on assignment.

So in your first example, myfun(x) makes a separate copy of the list x, and works on that in the function. It has no effect on the global variable x.

In your second example, myfun(x) makes a new reference to the environment x, and works on that in the function. That affects the original variable as well.

G. Grothendieck · Accepted Answer · 2022-02-02T14:55:08.970

There are several issues here:

Return value A function returns the value of the last statement executed and in this case both instance of myfun return x$val which is NA (adding NA to any number gives NA) so they do return the same value.
Copy on modify If an object such as x is modified in a function the function creates a copy of the object and then modifies the copy. The original object outside the function is not changed.
Object identity Environments have an identity independently of their contents so changing the contents of an environment does not change the identity of the environment itself -- it only changes the contents. Thus changing the contents of an environment does not cause the environment to be copied within the function. (This is similar to pointers in C where a program can modify the pointed to data without modifying the pointer itself.) On the other hand lists do not have an identity distinct from their contents. Within a function modifying the contents of a list causes the list to be copied to a new list and then the new list is modified.

Example

Below, we use address from pryr to track the address of the list. For environments simply printing the environment will show its address so we don't need it for that. The trace statements below cause R to show the address upon entry and upon exit.

The address of the list is ...968 before entering the function and upon entry but after modifying it within the function it has become a new list at a new address ...200 which is local to the function and distinct from the list outside the function which is still at address ...968 .

library(pryr)

x <- list()
x$val <- 1
myfun_env <- function(x) {x$val <- x$val + NA} 
trace(myfun_list, tracer = quote(print(address(x))), exit = quote(print(address(x))))
## [1] "myfun_list"
address(x)
## [1] "000000000bbbb968"
myfun_list(x)
## Tracing myfun_list(x) on entry 
## [1] "000000000bbbb968"
## Tracing myfun_list(x) on exit 
## [1] "000000000b368200"
## [1] NA
address(x)
## [1] "000000000bbbb968"

On the other hand in the case of an environment it has an identity distinct from its contents so changing the contents does not cause the environment to be copied to a new environment. The environment starts out at ...238 and never changes throughout the code.

x <- new.env()
x$val <- 18
myfun_env <- function(x) {x$val <- x$val + NA} 
trace(myfun_env, tracer = quote(print(x)), exit = quote(print(x)))
## [1] "myfun_env"
x
## <environment: 0x000000000cac4238>
myfun_env(x)
## Tracing myfun_env(x) on entry 
## <environment: 0x000000000cac4238>
## Tracing myfun_env(x) on exit 
## <environment: 0x000000000cac4238>
x
## <environment: 0x000000000cac4238>

Merijn van Tilborg · Answer 3 · 2022-02-02T12:59:37.250

Ok, based on the legit comment of the risk of using <<- some examples.

risky solution

x <- list("val" = 1L)
myfun <- function(x) {x$val <<- x$val + NA} 
myfun(x)
x
# $val
# [1] NA

risky solution going terribly wrong

x <- list("val" = "Please do not alter me at all")
y <- list("val" = 1L)
myfun <- function(x) {x$val <<- x$val + NA} 
myfun(y)
x
# $val
# [1] NA
y
# $val
# [1] 1

let your function output your data

x <- list("val" = 1L)
myfun <- function(x) list("val" = x$val + NA)
x <- myfun(x)
x
# $val
# [1] NA

use the environment

y  <- new.env()
y$val <- 18L
myfun <- function(x) {x$val <- x$val + NA} 
myfun(y)
y$val
# [1] NA

Difference between applying a function() to list() and to new.env()?

3 Answers3

Example