I need to analyse a large dataset (~40Go, 3 million rows); too big to open in a spreadsheet or R. To address that, I loaded it into an SQLite database, then used R (and RSQLite) to split into parts I can manipulate (70,000 rows). I need it in data.frame format. I used as.data.frame
:
#Connecting to the database
con = dbConnect(drv=RSQLite::SQLite(),dbname="path")
#Connecting to the table
d=tbl(con, "Test")
#Filter the database and convert it
d %>%
#I filtered using dplyr
filter("reduce number of rows") %>%
as.data.frame()
It works but takes a lot of time. How to make this faster (I have limited RAM)? I also tried setDT()
, but it doesn't work on SQLite data:
d %>% setDT()
Error in setDT(.) :
All elements in argument 'x' to 'setDT' must be of same length, but the profile of input lengths (length:frequency) is: [2:1, 13:1]
The first entry with fewer than 13 entries is 1