0

I am loading a large dataset that I need to filter approximately 1/20th of the rows and then group_by by 5 columns and summarize 3 remaining ones.

This page https://vroom.r-lib.org/articles/benchmarks.html says sampling, filtering, and grouped aggregation are much faster due to the lazy altrep implementation.

Since "Once a particular vector is fully materialized the speed for all subsequent operations should be identical to a normal R vector." my question is if it makes sense that it could still be advantageous to use dtplyr or data.table for the summarize operation, after filtering?

Arthur Yip
  • 5,810
  • 2
  • 31
  • 50
  • data.table or dtplyr as opposed to what? What do you want to compare? – Rasmus Larsen Oct 21 '22 at 06:49
  • I am trying to determine whether it is advantageous to do filter, group_by, and summarize within a lazily loaded vroom/readr data frame, or whether to use dtplyr or data.table to do the filter, group_by, and summarize / aggregate on loaded/materialized data frames. – Arthur Yip Oct 21 '22 at 22:51

0 Answers0