I've recently begun using RODBC to connect to PostgreSQL as I couldn't get RPostgreSQL to compile and run in Windows x64. I've found that read performance is similar between the two packages, but write performance is not. For example, using RODBC (where z is a ~6.1M row dataframe):
library(RODBC)
con <- odbcConnect("PostgreSQL84")
#autoCommit=FALSE seems to speed things up
odbcSetAutoCommit(con, autoCommit = FALSE)
system.time(sqlSave(con, z, "ERASE111", fast = TRUE))
user system elapsed
275.34 369.86 1979.59
odbcEndTran(con, commit = TRUE)
odbcCloseAll()
Whereas for the same ~6.1M row dataframe using RPostgreSQL (under 32-bit):
library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname="gisdb", user="postgres", password="...")
system.time(dbWriteTable(con, "ERASE222", z))
user system elapsed
467.57 56.62 668.29
dbDisconnect(con)
So, in this test, RPostgreSQL is about 3X as fast as RODBC in writing tables. This performance ratio seems to stay more-or-less constant regardless of the number of rows in the dataframe (but the number of columns has far less effect). I do notice that RPostgreSQL uses something like COPY <table> FROM STDIN
while RODBC issues a bunch of INSERT INTO <table> (columns...) VALUES (...)
queries. I also notice that RODBC seems to choose int8 for integers, while RPostgreSQL chooses int4 where appropriate.
I need to do this kind of dataframe copy often, so I would very sincerely appreciate any advice on speeding up RODBC. For example, is this just inherent in ODBC, or am I not calling it properly?