0

I have a really large table (8M rows) that I need to import in R on which I will be doing some processing. Problem is when I try to bring it in R using the DBI package I get an error

My code is below

options(java.parameters = "-Xmx8048m") 
psql.jdbc.driver <- "../postgresql-42.2.1.jar"
jdbc.url <- "jdbc:postgresql://server_url:port"
pgsql <- JDBC("org.postgresql.Driver", psql.jdbc.driver, "`")
con <- dbConnect(pgsql, jdbc.url, user="", password= '')
tbl <- dbGetQuery(con, "SELECT * FROM my_table;")

And the error I get is

Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ",  : 
Unable to retrieve JDBC result set for SELECT * FROM my_table; (Ran out of memory retrieving query results.) 

I can understand its because the result set is too big but I am not sure how to retrieve it by batches instead of all of it together. I have tried using dBSendQuery, dbReadTable and dbGetQuery all of them give the same error.

Any help would be appreciated!

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • `"SELECT * FROM my_table FETCH FIRST 1000 ROWS ONLY;"` ? – David Klotz Feb 07 '18 at 15:44
  • Hey @DavidKlotz I can fetch the data if I subset it but I want the entire chuck (all 8M rows) in a dataframe. This will only get me 1000 rows – Srinivasan Iyer Feb 07 '18 at 15:48
  • Well you just don't have enough memory to do that :). Check out the `dbplyr` package, it will allow you, from R, to do some preprocessing operations on the server side so you can bring only a reasonable amount of data to R – moodymudskipper Feb 07 '18 at 16:30
  • Still, 8M rows is not that much, so check David's suggestion to see if at least it works – moodymudskipper Feb 07 '18 at 16:31

1 Answers1

1

I got it to work by using the RPostgreSQL package instead of the default RJDBC and DBI package. It was able to do a sendQuery and then used fetch recursively to get the data in chunks of 10,000.

main_tbl <- dbFetch(postgres_query, n=-1) #didnt work so tried in chunks 
df<- data.frame()
while (!dbHasCompleted(postgres_query)) {
  chunk <- dbFetch(postgres_query, 10000)
  print(nrow(chunk))
  df = rbind(df, chunk)
}