dplyr
functions are typically incredibly performant, having been optimised by the open source R community, with many going so far as to run c++
under the hood to make them much faster.
Does the BigQuery code generated through bigrquery
and dbplyr
receive any optimisation, or does it simply generate the sql however it can (unoptimised)? (note that both bigrquery
and dbplyr
, like dplyr
are also tidyverse packages, and are both authored by the dplyr author Hadley Wickham)
Background
I'm interested in how optimised the generated BigQuery code is because I am trying to decide on is whether it is worth further optimising some batch processes written in bigrquery
and dbplyr
by manually rewriting some of the BigQuery code (rather than using those packages). If I am unlikely to see great performance improvements, I will not dedicate the time to do so.
Example query
This following is from the bigrquery
readme
library(dplyr)
natality <- tbl(con, "natality")
natality %>%
select(year, month, day, weight_pounds) %>%
head(10) %>%
collect()