4

The .env pronoun used to refer to objects in an environment (as opposed to inside a data.frame) works well inside other dplyr verbs but returns an error in slice_max. Why? Consider the following functions:


library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(rlang)

f1 <- function(y) {
  d <- tibble(x = runif(20))
  d %>% 
    slice_max(order_by = .data$x, n = .env$y)
}

f2 <- function(y) {
  d <- tibble(x = runif(20))
  d %>% 
    filter(.data$x >= .env$y)
}

f3 <- function(y) {
  d <- tibble(x = runif(20))
  d %>% 
    mutate(z = .env$y)
}

f1(2)
#> Error: `n` must be a single number.
f2(0.8)
#> # A tibble: 8 x 1
#>       x
#>   <dbl>
#> 1 0.936
#> 2 0.812
#> 3 0.998
#> 4 0.962
#> 5 0.901
#> 6 0.875
#> 7 1.00 
#> 8 0.919
f3(2)
#> # A tibble: 20 x 2
#>         x     z
#>     <dbl> <dbl>
#>  1 0.0318     2
#>  2 0.928      2
#>  3 0.983      2
#>  4 0.622      2
#>  5 0.583      2
#>  6 0.0314     2
#>  7 0.481      2
#>  8 0.791      2
#>  9 0.476      2
#> 10 0.599      2
#> 11 0.468      2
#> 12 0.234      2
#> 13 0.276      2
#> 14 0.382      2
#> 15 0.914      2
#> 16 0.736      2
#> 17 0.572      2
#> 18 0.863      2
#> 19 0.337      2
#> 20 0.515      2

Created on 2020-11-16 by the reprex package (v0.3.0)

Giovanni Colitti
  • 1,982
  • 11
  • 24

1 Answers1

5

The error is thrown from the function dplyr:::check_slice_size which is called by slice_max.data.frame. Lines 7:9 of that function are:

        if (!is.numeric(n) || length(n) != 1) {
            abort("`n` must be a single number.")
        }

So n has to be a length-one number. The .env pronoun is not implemented here.

So is this a bug? I would argue that it isn't. You don't need .env here, because the n parameter does not use tidy evaluation, nor should it. Since it only makes sense for n to be a single number, the only situation where it would make sense to use tidy evaluation would be in a single-row tibble. But if you know you have a single-row tibble, it doesn't make sense to be calling slice_max. It's a catch-22: the only time you would ever be able to use tidy evaluation would be the one time when it wouldn't be useful to do so. It is therefore a good design decision.

You can rest assured that there is no ambiguity. If you use y here, it is always interpreted as you would intend with .env$y:

library(dplyr)
library(rlang)

f1 <- function(y) {
  d <- tibble(x = runif(20), y = rnorm(20))
  d %>% 
    slice_max(order_by = .data$x, n = y)
}

f1(2)
#> # A tibble: 2 x 2
#>       x      y
#>   <dbl>  <dbl>
#> 1 0.971 -1.65 
#> 2 0.918  0.151

Created on 2020-11-16 by the reprex package (v0.3.0)

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87