1

I have this simple R program that reads a table (1000000 rows, 10 columns) from a sqlite database into an R data.table and then I do some operations on the data and try to write it back into a new table of the same sqlite database. Reading the data takes a few seconds but writing the table back into the sqlite database takes hours. I don't know how long exactly because it has never finished, the longest I have tried is 8 hours.

This is the simplified version of the program:

  library(DBI)
  library(RSQLite)
  library(data.table)

  driver = dbDriver("SQLite")
  con = dbConnect(driver, dbname = "C:/Database/DB.db")

  DB <- data.table(dbGetQuery(con, "SELECT * from Table1"))  

  dbSendQuery(con, "DROP TABLE IF EXISTS Table2")
  dbWriteTable(con, "Table2", DB)
  dbDisconnect(con)
  dbUnloadDriver(driver)

Im using R version 2.15.2, package version are:

data.table_1.8.6 RSQLite_0.11.2 DBI_0.2-5

I have tried on multiple systems and on different Windows versions and in all cases it takes an incredible amount of time to write this table into the sqlite database. When looking at the file size of the sqlite database it writes at about 50KB per minute.

My question is does anybody know what causes this slow write speed?

Tim had the answer but I can't flag it as such because it is in the comments.

As in: ideas to avoid hitting memory limit when using dbWriteTable to save an R data table inside a SQLite database I wrote the data to the database in chunks

  chunks <- 100

  starts.stops <- floor( seq( 1 , nrow( DB ) , length.out = chunks ) )

  system.time({
    for ( i in 2:( length( starts.stops ) )  ){

      if ( i == 2 ){
        rows.to.add <- ( starts.stops[ i - 1 ] ):( starts.stops[ i ] )
      } else {
        rows.to.add <- ( starts.stops[ i - 1 ] + 1 ):( starts.stops[ i ] )
      }

      dbWriteTable( con , 'Table2' , DB[ rows.to.add , ] , append = TRUE )
    }
  })

It takes:

   user  system elapsed 
   4.49    9.90  214.26 

time to finish writing the data to the database. Apparantly I was hitting the memory limit without knowing it.

Community
  • 1
  • 1
Kailo
  • 19
  • 1
  • 3
  • 3
    I just wrote a 1e6 x 10 data frame to an SQLite table from R and it took 7 seconds. Can you share the output of `str()` on your data? Can you provide some more details about your computer? – joran Mar 13 '13 at 16:22
  • 1
    Can you write a small amount of data to that table (`head(DB)`, for example)? – Richie Cotton Mar 13 '13 at 16:30
  • What happens if you try to write to a new table3 and then drop table2 afterwards? – Tim Mar 13 '13 at 17:10
  • It could be the middleware that is causing the slowness. Here is a question that might be germane: http://stackoverflow.com/questions/14246683/ideas-to-avoid-hitting-memory-limit-when-using-dbwritetable-to-save-an-r-data-ta – Tim Mar 13 '13 at 18:09

1 Answers1

0

Use a single transaction (commit) for all the records. Add a

dbSendQuery(con, "BEGIN")

before the insert and a

dbSendQuery(con, "END")

to complete. Much faster.

Yves
  • 43
  • 2