I'm trying to select some subsets from the dataframe in R using sqldf. So my code looks like:
library("sqldf")
...
usecase <- as.data.frame(use_case_list[[i]])
user_day_view <- sqldf("select distinct targetuser, day, count(*) from usecase group by targetuser,day")
...
All in all, I do about 80-90 such sqldf queries in a for / foreach loop.
It works well on the test dataset, however, when I try to run it on real dataset (about 21 Gb of data), at some point it fails with the following error message:
"rsqlite_query_fetch: failed first step: database or disk is full"
At this point, an R process uses about 60 Gb RAM, but I still have several Tb RAM free...
I have checked a documentation of sqldf (https://cran.r-project.org/web/packages/sqldf/sqldf.pdf) and it explicitly says, that
"For SQLite and h2 data bases this defaults to ":memory:" which results in an embedded database"
So I should have an in-memory SQLite database, since I do not initialise it in any way and also do not provide any options/parameters to sqldf. So I cannot understand, why it reports about full database or disk if I still have enough RAM to run it...