The following example from the docs of tidyr::expand()
works as expected on a local, in-memory data.frame. The same example works on a tbl_dbi
object when called on a single column, but displays an SQL parsing error when applied to multiple columns:
library(tidyverse, quietly = TRUE)
library(duckdb, quietly = TRUE)
fruits <- tibble(
type = c("apple", "orange", "apple", "orange", "orange", "orange"),
year = c(2010, 2010, 2012, 2010, 2010, 2012),
size = factor(
c("XS", "S", "M", "S", "S", "M"),
levels = c("XS", "S", "M", "L")
),
weights = rnorm(6, as.numeric(size) + 2)
)
fruits %>% tidyr::expand(type, size)
#> # A tibble: 8 × 2
#> type size
#> <chr> <fct>
#> 1 apple XS
#> 2 apple S
#> 3 apple M
#> 4 apple L
#> 5 orange XS
#> 6 orange S
#> 7 orange M
#> 8 orange L
## now create a remote tbl
con <- dbConnect(duckdb::duckdb())
dbWriteTable(con, "fruits", fruits)
fruits_db <- tbl(con, "fruits")
## same command works with one column:
fruits_db %>% tidyr::expand(type)
#> # Source: lazy query [?? x 1]
#> # Database: duckdb_connection
#> type
#> <chr>
#> 1 apple
#> 2 orange
## but fails with two:
fruits_db %>% tidyr::expand(type, size)
#> Error in .local(conn, statement, ...): duckdb_prepare_R: Failed to prepare query SELECT *
#> FROM (SELECT "type", "size"
#> FROM (SELECT DISTINCT "type"
#> FROM "fruits") "LHS"
#> LEFT JOIN (SELECT DISTINCT "size"
#> FROM "fruits") "RHS"
#> ) "q01"
#> LIMIT 11
#> Error: Parser Error: syntax error at or near ")"
#> LINE 7: ) "q01"
#> ^
Created on 2021-10-01 by the reprex package (v2.0.1)
Why does this happen and how can we avoid it? (It looks to me like a very spurious "q01" is being attached to the end of SQL generated by the tidyr translation, but no idea why that should be the case and why it would happen only in the two-column expand case).
I suspect this is a bug, but in this context I'm not sure how to better pinpoint the source of the error -- i.e. is the problem due to a bug in duckdb
, dbplyr
, tidyr
, or some other component of the translation?