0

I have a database that I want to query in R using duckdb. The two tables in question are large, 183 million rows by eight columns. When I execute the following code:


compareid <- dbGetQuery(con, "(SELECT UserId FROM d14072021  EXCEPT SELECT UserId FROM d15072021 ) UNION ALL 
                        ( SELECT UserId FROM d15072021  EXCEPT SELECT UserId FROM d14072021 )")

R straight-up crashes with a fatal error. Since I don't actually get an error message in-console I have no idea what's going wrong

I suspect it might be a memory issue? I've tried limiting the size by only looking at the first 100 rows:

compareid <- dbGetQuery(con, "(SELECT UserId FROM d14072021  EXCEPT SELECT UserId FROM d15072021 LIMIT 100) UNION ALL                    
     ( SELECT UserId FROM d15072021  EXCEPT SELECT UserId FROM d14072021 LIMIT 100)")

but this crashes all the same.

EDIT: I just found out that LIMIT is an output modifier, so this doesn't actually limit the number of rows the query selects.

  • Have you tried executing that on a duckdb console instead of from within R? – r2evans Nov 21 '22 at 17:49
  • I might be wrong here but I don't think duckdb has its own console. I installed it via R. – dominik hauser Nov 22 '22 at 10:12
  • https://duckdb.org/, go to "Installation", look for the "CLI" option. Alternatively (same files/installers), go to https://github.com/duckdb/duckdb/releases/ and get one of the `*_cli-*` zip files. – r2evans Nov 22 '22 at 11:40

0 Answers0