I just updated to the latest dplyr
version 1.0.0. Trying out the new features in the summarize
function, like across
and .groups
. But to my surprise, my code has become very slow compared to the former version. Is this a known issue? Am I doing anything wrong?? Check out the example.
# Create example data set
library(tidyverse)
n_grps <- 10000
n_rep <- 1000
tbbl <- tibble(grp = rep(1:n_grps, each = n_rep),
value1 = rnorm(n_grps * n_rep),
value2 = rnorm(n_grps * n_rep))
Running this the old fashioned way
library(tictoc)
tic()
tbbl %>%
group_by(grp) %>%
summarize_all(mean) %>%
ungroup()
toc()
takes less than a second on my Windows machine. Replacing it by summarize
/ across
tic()
tbbl %>%
group_by(grp) %>%
summarize(across(everything(), mean), .groups = "drop")
toc()
takes more than 9 seconds!
Suggestions are welcome.