3

I am working with the R programming language. I am trying to convert "dplyr/dbplyr" code into SQL code using the "show_query()" option.

For example, I tried to run the following code:

#first code

library(dplyr)
library(dbplyr)

data(iris)

 iris %>% 
 filter(Species == "setosa")  %>% 
  summarise(mean.Sepal.Length = mean(Sepal.Length),
            mean.Petal.Length = mean(Petal.Length))  %>% show_query()

However, this returned the following error (note: when you remove "show_query()", the above code actually runs):

Error in UseMethod("show_query") : 
  no applicable method for 'show_query' applied to an object of class "data.frame"

I think I found a solution to this problem:

#second code

> con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

> flights <- copy_to(con, iris)

> flights %>% 
    filter(Species == "setosa")  %>% 
     summarise(mean.Sepal.Length = mean(Sepal.Length),
              mean.Petal.Length = mean(Petal.Length))  %>% show_query()


<SQL>
SELECT AVG(`Sepal.Length`) AS `mean.Sepal.Length`, AVG(`Petal.Length`) AS `mean.Petal.Length`
FROM `iris`
WHERE (`Species` = 'setosa')

Warning message:
Missing values are always removed in SQL.
Use `mean(x, na.rm = TRUE)` to silence this warning
This warning is displayed only once per session. 

Can someone please tell me why the original code I tried did not work, but the second code is working? Why is it necessary to establish a connection and add the "copy_to" statement - even if I want to run something locally? I am just curious to convert DPLYR code into SQL - at this point, I just want to run everything locally, and not connect to a remote database. Thus, why do I need to establish a connection if I want to run this locally? Why does the show_query() statement not work in the original code?

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41
stats_noob
  • 5,401
  • 4
  • 27
  • 83
  • 1
    Hey, @stats555 did you ever get the answer to this one? I would love to understand how this is working as well – hachiko Jan 19 '22 at 16:31
  • @ hachiko: hey! so far nothing! :( – stats_noob Jan 19 '22 at 16:32
  • 1
    ah shoot well I started a bounty for you let's see if we can get an answer – hachiko Jan 19 '22 at 16:33
  • Wow - thank you! I never knew other people would be interested in this as well! – stats_noob Jan 19 '22 at 16:34
  • I'm trying to avoid using SQL! :P – hachiko Jan 19 '22 at 16:35
  • I actually knew how to do this in SQL! I was just curious to know why this detail wasnt working! – stats_noob Jan 19 '22 at 16:37
  • I missed this question originally because it was not tagged dbplyr. You will find several related answers under that tag, including: [this](https://stackoverflow.com/questions/59307877/can-you-name-dbplyrs-simulated-lazy-tables) and [this](https://stackoverflow.com/questions/49078185/how-to-generate-sql-from-dbplyr-without-a-database-connection). – Simon.S.A. Jan 19 '22 at 20:16

2 Answers2

4

show_query() translates the dplyr syntax into query code for the backend you are using.

A database backend using dbplyr will result in an SQL query (as a data.table backend using dtplyr will result in a DT[i,j,by] query).

show_query doesn't need to have a method to translate dplyr syntax applied to a data.frame backend to itself, hence the error message you're getting.

An easy way to get an SQL query result is to transform the data.frame into an in-memory database with memdb_frame:

memdb_frame(iris) %>% 
  filter(Species == "setosa")  %>% 
  summarise(mean.Sepal.Length = mean(Sepal.Length),
            mean.Petal.Length = mean(Petal.Length))  %>% show_query()

<SQL>
SELECT AVG(`Sepal.Length`) AS `mean.Sepal.Length`, AVG(`Petal.Length`) AS `mean.Petal.Length`
FROM `dbplyr_002`
WHERE (`Species` = 'setosa')
Waldi
  • 39,242
  • 6
  • 30
  • 78
4

dbplyr translation of R commands to SQL only works for remote tables. show_query() reveals the translated query that would be used to fetch data from the database. If the table is in local R memory then there is no need for SQL translation.

Part of the reason for this is that dbplyr has different translations defined for different databases. So without knowing what flavor of SQL/database you are using it can not determine the correct translation.

If you want to produce translations without connecting to a database, you can use simulated connections:

library(dbplyr)
library(dplyr)

data(iris)

# SQL server translation
remote_df = tbl_lazy(iris, con = simulate_mssql())
remote_df %>%
  filter(Species == 'setosa') %>%
  head() %>%
  show_query()

# MySQL translation
remote_df = tbl_lazy(iris, con = simulate_mysql())
remote_df %>%
  filter(Species == 'setosa') %>%
  head() %>%
  show_query()

These will produce slightly different SQL translations TOP vs LIMIT keywords.

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41