0

The following example from the docs of tidyr::expand() works as expected on a local, in-memory data.frame. The same example works on a tbl_dbi object when called on a single column, but displays an SQL parsing error when applied to multiple columns:

library(tidyverse, quietly = TRUE)
library(duckdb, quietly = TRUE)

fruits <- tibble(
  type   = c("apple", "orange", "apple", "orange", "orange", "orange"),
  year   = c(2010, 2010, 2012, 2010, 2010, 2012),
  size  =  factor(
    c("XS", "S",  "M", "S", "S", "M"),
    levels = c("XS", "S", "M", "L")
  ),
  weights = rnorm(6, as.numeric(size) + 2)
)

fruits %>% tidyr::expand(type, size)
#> # A tibble: 8 × 2
#>   type   size 
#>   <chr>  <fct>
#> 1 apple  XS   
#> 2 apple  S    
#> 3 apple  M    
#> 4 apple  L    
#> 5 orange XS   
#> 6 orange S    
#> 7 orange M    
#> 8 orange L
## now create a remote tbl
con <- dbConnect(duckdb::duckdb())
dbWriteTable(con, "fruits", fruits)
fruits_db <- tbl(con, "fruits")

## same command works with one column:
fruits_db %>% tidyr::expand(type)
#> # Source:   lazy query [?? x 1]
#> # Database: duckdb_connection
#>   type  
#>   <chr> 
#> 1 apple 
#> 2 orange
## but fails with two:
fruits_db %>% tidyr::expand(type, size)
#> Error in .local(conn, statement, ...): duckdb_prepare_R: Failed to prepare query SELECT *
#> FROM (SELECT "type", "size"
#> FROM (SELECT DISTINCT "type"
#> FROM "fruits") "LHS"
#> LEFT JOIN (SELECT DISTINCT "size"
#> FROM "fruits") "RHS"
#> ) "q01"
#> LIMIT 11
#> Error: Parser Error: syntax error at or near ")"
#> LINE 7: ) "q01"
#>         ^

Created on 2021-10-01 by the reprex package (v2.0.1)

Why does this happen and how can we avoid it? (It looks to me like a very spurious "q01" is being attached to the end of SQL generated by the tidyr translation, but no idea why that should be the case and why it would happen only in the two-column expand case).

I suspect this is a bug, but in this context I'm not sure how to better pinpoint the source of the error -- i.e. is the problem due to a bug in duckdb, dbplyr, tidyr, or some other component of the translation?

cboettig
  • 12,377
  • 13
  • 70
  • 113
  • 1
    Probably, the syntax generated may not work in duckdb. I tried the same with `sqlite` and it is working fine `fruits_db2 %>% tidyr::expand(type, size) # Source: lazy query [?? x 2] # Database: sqlite 3.35.5 [:memory:]` – akrun Oct 01 '21 at 22:07
  • The only difference with `show_query` I find is the double quote versus the backquote `SELECT `type`, `size` FROM (SELECT DISTINCT `type`` in sqlite and `SELECT "type", "size" FROM (SELECT DISTINCT "type"` and in duckdb – akrun Oct 01 '21 at 22:08
  • I tried this query outside the expand, but it is still having an error, so may be this is not implemented `dbGetQuery(con, "SELECT type, size FROM (SELECT DISTINCT type FROM fruits) AS LHS LEFT JOIN (SELECT DISTINCT size FROM fruits) AS RHS")` – akrun Oct 01 '21 at 22:57
  • 1
    Thanks @akrun . I reported this to duckdb devs, https://github.com/duckdb/duckdb/issues/2362, though it sounds like `dbplyr` might be generating imperfect SQL syntax (a JOIN with no ON) that SQLite decides to assume is actually a CROSS JOIN? – cboettig Oct 03 '21 at 03:35

0 Answers0