1

I would like to work on a large database table. The idea was to read some rows, process them, append the result to another table, and so on. In code:

    stmt <- "SELECT * FROM input_table WHERE cond"
    rs <- DBI::dbSendQuery(con, stmt)
    while (!DBI::dbHasCompleted(rs)) {
        current_set <- DBI::dbFetch(rs, 50000)
        res <- process(current_set)
        dbWriteTable(con, "output_table", value=res, append=TRUE)
    }
    DBI::dbClearResult(rs)

However, I get the message "Closing open result set, pending rows". Is there any way to save the intermediate output?

I would like to work with sqlite and later on Postgres.

Karsten W.
  • 17,826
  • 11
  • 69
  • 103
  • 2
    Maybe just have two connection objects `con1` and `con2` rather than trying to talk to _two different backends_ throught the same connection? – Dirk Eddelbuettel Dec 10 '19 at 00:26

1 Answers1

0

Just for reference, I ended up with a solution using a LIMIT / OFFSET construct. Not sure if it efficient, but it is fast enough for my case (700k rows).

batchsize <- 50000
stmt <- "SELECT * FROM input_table WHERE cond" 
lim <- paste("LIMIT", batchsize, ";")
finished <- FALSE
i <- 0
while (!finished) {
    curr_stmt <- paste(stmt, lim)
    current_set <- dbGetQuery(con, curr_stmt)
    res <- process(current_set)
    dbWriteTable(con, "output_table", value=res, append=TRUE)
    finished <- nrow(current_set) < batchsize
    i <- i + nrow(current_set)
    lim <- paste("LIMIT", batchsize, "OFFSET", i, ";")
}
Karsten W.
  • 17,826
  • 11
  • 69
  • 103