5

Let's say I have a function that takes in a data frame and a varying number of variables from that data frame using non-standard evaluation (NSE). Is there a faster/more straightforward way to count the number of provided variables than select()ing these variables and counting the columns?

# Works but seems non-ideal
nvar <- function(df, vars) {
  vars_en <- rlang::enquo(vars)
  df_sub <- dplyr::select(df, !!vars_en)
  ncol(df_sub)
}
nvar(mtcars, mpg:hp)
#> 4
Tung
  • 26,371
  • 7
  • 91
  • 115
Jeffrey Girard
  • 761
  • 4
  • 20

1 Answers1

6

Highly doubtful (I realize this may receive downvotes) - I think the most sensible alternative is to simply select from the colnames of the data.frame like so - uses tidyselect::vars_select

nvar1 <- function(df, vars) {
  vars_en <- rlang::enquo(vars)
  ans <- vars_select(names(df), !! vars_en)
  length(ans)
}

But even this is slower than select(df) %>% ncol

library(microbenchmark)
library(nycflights13)
library(tidyselect)

nvar <- function(df, vars) {
  vars_en <- rlang::enquo(vars)
  df_sub <- dplyr::select(df, !!vars_en)
  ncol(df_sub)
}

identical(nvar(nycflights13::flights, day:sched_arr_time), nvar1(nycflights13::flights, day:sched_arr_time))
# TRUE

microbenchmark(nvar(nycflights13::flights, day:sched_arr_time), nvar1(nycflights13::flights, day:sched_arr_time), unit='relative', times=100L)

# Unit: relative
                                             # expr      min       lq    mean   median       uq       max neval
  # nvar(nycflights13::flights, day:sched_arr_time) 1.000000 1.000000 1.00000 1.000000 1.000000 1.0000000   100
 # nvar1(nycflights13::flights, day:sched_arr_time) 1.685793 1.680676 1.60114 1.688626 1.660196 0.9878235   100 
CPak
  • 13,260
  • 3
  • 30
  • 48