On the one hand, is the tbl()
function which is a SQL statement of type SELECT * FROM Table
this works over DBMS and to retrieve data from database server it need a pull function like collect()
. On the other hand, is the function dbGetQuery()
which retrieves the data into R session using a SQL query. Both requires a connection to server and a statement, but the first create the statement using SQL translations and the other is the user which write the SQL query.
To illustrate I will use a temporal table tmp
in a postgreSQL DBMS:
# Example on postgreSQL
library(tidyverse)
library(dbplyr)
library(RPostgreSQL)
library(DBI) # This is loaded with RPostgreSQL package
con <- dbConnect(PostgreSQL(),
dbname="test",
host="localhost",
port=5432,
user="user",
password="pass")
Dummy data to PostgreSQL server
date <- data_frame(Col1 = c("20180212", "20180213"),
Col2 = c("A", "B"))
dbWriteTable(con, "tmp", date, temporary = TRUE)
With tbl()
function
tbl(con, "tmp") %>% show_query()
#><SQL>
#>SELECT *
#>FROM "tmp"
tbl(con, "tmp") %>%
mutate(date = to_date(Col1, "YYYYMMDD")) %>%
show_query()
#><SQL>
#>SELECT "row.names", "Col1", "Col2", TO_DATE("Col1", 'YYYYMMDD') AS "date"
#>FROM "tmp"
tbl(con, "tmp") %>%
mutate(date = to_date(Col1, "YYYYMMDD")) %>% #this works on DBMS
collect() %>% #This retrive to R session
str()
#>Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
#> $ row.names: chr "1" "2"
#> $ Col1 : chr "20180212" "20180213"
#> $ Col2 : chr "A" "B"
#> $ date : Date, format: "2018-02-12" "2018-02-13"
With dbGetQuery()
function
dbGetQuery(con, "SELECT * FROM tmp") %>%
str()
#>'data.frame': 2 obs. of 3 variables:
#> $ row.names: chr "1" "2"
#> $ Col1 : chr "20180212" "20180213"
#> $ Col2 : chr "A" "B"
dbGetQuery(con, "SELECT * FROM tmp") %>%
mutate(date = as.Date(Col1, format = "%Y%m%d")) %>% #This works on R session
str()
#>'data.frame': 2 obs. of 4 variables:
#> $ row.names: chr "1" "2"
#> $ Col1 : chr "20180212" "20180213"
#> $ Col2 : chr "A" "B"
#> $ date : Date, format: "2018-02-12" "2018-02-13"
Conclusion
tbl()
function is a high level over dbGetQuery()
in R-programming. Consider to re-design your code chain understanding the differences between both functions and best uses for these.