When creating data frames with multiple variables using the data.frame()
function, each variable
cannot be a function of other variables generated within data.frame()
. This is demonstrated in the code sample below, where Example 1 succeeds because the expressions for x
and y
don't require any object in our environment and Example 2 returns an error because x
is not in the global environment.
Why does this happen?
I can think of two possible explanations, but I do not know how to evaluate them (pun intended):
Scoping: each assignment expression is evaluated sequentially (i.e.
x
is assigned theny
is assigned) but only looks for objects in the environment in whichdata.frame()
was called. Sincedata.frame()
was called in the global environment butx
is not in the global environment, an error is returned in Example 2. This may also be whyy = 6
rather thany = 1
in Example 3.Evaluation: all assignment expressions are evaluated simultaneously (i.e. in parallel), causing
x
to not exist in any environment at the timey
is assigned a value that is a function ofx
. While R employs lexical (i.e. static) scoping, perhapsdata.frame()
is designed to look forx
in both the environment in whichx
was called and the child environments within the function.
# Example 1 (success)
data.frame(x = 0, y = 0 + 1)
#> x y
#> 1 0 1
# Example 2 (failure)
data.frame(x = 0, y = x + 1)
#> Error in data.frame(x = 0, y = x + 1): object 'x' not found
# Example 3
x <- 5
data.frame(x = 0, y = x + 1)
#> x y
#> 1 0 6
Note: I am trying to understand why data.frame()
exhibits this behavior. As observed in the comments and demonstrated below, tibble::tibble()
is an excellent option for users who wish to generate variables in a data.frame
conditional on other variables in the data.frame
.
library(tibble)
# Tibble Example 1: y uses x!
tibble(x = 0, y = x + 1)
#> # A tibble: 1 x 2
#> x y
#> <dbl> <dbl>
#> 1 0 1
# Tibble Example 2: y uses x, ignoring the global x!
x <- 5
tibble(x = 0, y = x + 1)
#> # A tibble: 1 x 2
#> x y
#> <dbl> <dbl>
#> 1 0 1