3

I'm trying to select some subsets from the dataframe in R using sqldf. So my code looks like:

library("sqldf")
...
usecase <- as.data.frame(use_case_list[[i]])
user_day_view <- sqldf("select distinct targetuser, day, count(*) from usecase group by targetuser,day")
...

All in all, I do about 80-90 such sqldf queries in a for / foreach loop.

It works well on the test dataset, however, when I try to run it on real dataset (about 21 Gb of data), at some point it fails with the following error message:

"rsqlite_query_fetch: failed first step: database or disk is full"

At this point, an R process uses about 60 Gb RAM, but I still have several Tb RAM free...

I have checked a documentation of sqldf (https://cran.r-project.org/web/packages/sqldf/sqldf.pdf) and it explicitly says, that

"For SQLite and h2 data bases this defaults to ":memory:" which results in an embedded database"

So I should have an in-memory SQLite database, since I do not initialise it in any way and also do not provide any options/parameters to sqldf. So I cannot understand, why it reports about full database or disk if I still have enough RAM to run it...

Andrey Sapegin
  • 454
  • 8
  • 33
  • Try `dbname = tempfile()` as an argument to `sqldf`. – G. Grothendieck Jul 14 '16 at 10:47
  • I have already tried drv="RH2" and it works (I switched the database from SQLite to H2), and am now trying dbname=NULL (probably that's what you meant, cause dbname=tempfile() will use filesystem instead of RAM). – Andrey Sapegin Jul 14 '16 at 14:51
  • I was suggesting you use the file system -- not NULL -- but if H2 works and sqlite does not then certainly just use that. – G. Grothendieck Jul 15 '16 at 01:09
  • ha, the H2 works longer than SQlite, but in the end it also fails with 'java.lang.OutOfMemoryError: Java heap space'... – Andrey Sapegin Jul 18 '16 at 07:17
  • 1
    You can increase the java heap space to 4 gigabytes, say, by using this prior to the first statement which loads java: `options( java.parameters = "-Xmx4g" )` . You could also try PostgreSQL although you will have to install it first as it does not come fully packaged in an R package like SQLite in RSQLite and H2 in RH2. – G. Grothendieck Jul 18 '16 at 10:58

0 Answers0