setDT instead of as.data.table for piping with dplyr?

Asked Apr 04 '20 at 12:41

Active Apr 04 '20 at 12:41

Viewed 204 times

I've noted that dtplyr (released this January 1.0.1) uses as.data.table to bring the variable back to data.table type: https://dtplyr.tidyverse.org/articles/translation.html

I'm a big fan and user of data.table and use it pipeline with dplyr for many years, for which purpose I wrote myself many of those wrapper functions, which are now part of dtplyr.

I'm however using setDT, as I thought it's more efficient as keeps with data.table mentality of assigning by reference.

So I wonder why Hadley is not using it?
And in general - what's more efficient to use of the two, when one needs to convert from data.frame (or tibble) to data.table?

asked Apr 04 '20 at 12:41

IVIM

2,167
1
15
41

4

as.data.table is used because it guarantees a copy is made of the input object; in general this would be slower than setDT, but tidyverse principles require immutability & so avoid changing the input object. IIRC there is an option to disable this, I forget the value off the top of my head, you can check the vignette – MichaelChirico Apr 04 '20 at 13:01

setDT instead of as.data.table for piping with dplyr?

0 Answers0