4

I just updated to the latest dplyr version 1.0.0. Trying out the new features in the summarize function, like across and .groups. But to my surprise, my code has become very slow compared to the former version. Is this a known issue? Am I doing anything wrong?? Check out the example.

# Create example data set
library(tidyverse)
n_grps <- 10000
n_rep <- 1000
tbbl <- tibble(grp = rep(1:n_grps, each = n_rep),
               value1 = rnorm(n_grps * n_rep),
               value2 = rnorm(n_grps * n_rep))

Running this the old fashioned way

library(tictoc)
tic()
tbbl %>% 
  group_by(grp) %>% 
  summarize_all(mean) %>% 
  ungroup()
toc()

takes less than a second on my Windows machine. Replacing it by summarize / across

tic()
tbbl %>% 
  group_by(grp) %>% 
  summarize(across(everything(), mean), .groups = "drop")
toc()

takes more than 9 seconds!

Suggestions are welcome.

Henrik
  • 65,555
  • 14
  • 143
  • 159
Gijs
  • 81
  • 1
  • 8
  • 7
    I think it's a known issue: [across() performance slow compared to scoped variant](https://github.com/tidyverse/dplyr/issues/4953). – Henrik Jun 22 '20 at 12:21

0 Answers0