I have this simple R program that reads a table (1000000 rows, 10 columns) from a sqlite database into an R data.table and then I do some operations on the data and try to write it back into a new table of the same sqlite database. Reading the data takes a few seconds but writing the table back into the sqlite database takes hours. I don't know how long exactly because it has never finished, the longest I have tried is 8 hours.
This is the simplified version of the program:
library(DBI)
library(RSQLite)
library(data.table)
driver = dbDriver("SQLite")
con = dbConnect(driver, dbname = "C:/Database/DB.db")
DB <- data.table(dbGetQuery(con, "SELECT * from Table1"))
dbSendQuery(con, "DROP TABLE IF EXISTS Table2")
dbWriteTable(con, "Table2", DB)
dbDisconnect(con)
dbUnloadDriver(driver)
Im using R version 2.15.2, package version are:
data.table_1.8.6 RSQLite_0.11.2 DBI_0.2-5
I have tried on multiple systems and on different Windows versions and in all cases it takes an incredible amount of time to write this table into the sqlite database. When looking at the file size of the sqlite database it writes at about 50KB per minute.
My question is does anybody know what causes this slow write speed?
Tim had the answer but I can't flag it as such because it is in the comments.
As in: ideas to avoid hitting memory limit when using dbWriteTable to save an R data table inside a SQLite database I wrote the data to the database in chunks
chunks <- 100
starts.stops <- floor( seq( 1 , nrow( DB ) , length.out = chunks ) )
system.time({
for ( i in 2:( length( starts.stops ) ) ){
if ( i == 2 ){
rows.to.add <- ( starts.stops[ i - 1 ] ):( starts.stops[ i ] )
} else {
rows.to.add <- ( starts.stops[ i - 1 ] + 1 ):( starts.stops[ i ] )
}
dbWriteTable( con , 'Table2' , DB[ rows.to.add , ] , append = TRUE )
}
})
It takes:
user system elapsed
4.49 9.90 214.26
time to finish writing the data to the database. Apparantly I was hitting the memory limit without knowing it.