3

I have a MySQL table I am attempting to access with R using RMySQL.

There are 1690004 rows that should be returned from

dbGetQuery(con, "SELECT * FROM tablename WHERE export_date ='2015-01-29'")

Unfortunately, I receive the following warning messages:

In is(object, Cl) : error while fetching row
In dbGetQuery(con, "SELECT * FROM tablename WHERE export_date ='2015-01-29'",  : pending rows

And only receive ~400K rows.

If I break the query into several "fetches" using dbSendQuery, the warning messages start appearing after ~400K rows are received.

Any help would be appreciated.

Optimus
  • 1,354
  • 1
  • 21
  • 40
  • 1
    Try using the dev version at https://github.com/rstats-db/RMySQL. If you still get the error, please create a reproducible example and file a bug. – hadley Feb 26 '15 at 12:25
  • I can't even connect now. Using the same connection parameters as before, it says 'Error in .Call("RMySQL_connection_create", PACKAGE = "RMySQL", host, user, : "RMySQL_connection_create" not available for .Call() for package "RMySQL"' – Optimus Feb 26 '15 at 14:13
  • It's possible that the original problem is due to webhosting limits (I tried querying via MySQL Workbench and I get a connection lost error). – Optimus Feb 26 '15 at 14:53

1 Answers1

3

So, it looks like it was due to a 60 second timeout imposed by my hosting provider (damn Arvixe!). I got around this by "paging/chunking" the output. Because my data has an auto-incrementing primary key, every row returned is in order, allowing me to take the next X rows after each iteration.

To get 1.6M rows I did the following:

library(RMySQL)
con <- MySQLConnect() # mysql connection function
day <- '2015-01-29' # date of interest
numofids <- 50000 # number of rows to include in each 'chunk'
count <- dbGetQuery(con, paste0("SELECT COUNT(*) as count FROM tablename WHERE export_date = '",day,"'"))$count # get the number of rows returned from the table.
dbDisconnect(con)
ns <- seq(1, count, numofids) # get sequence of rows to work over
tosave <- data.frame() # data frame to bind results to
# iterate through table to get data in 50k row chunks
for(nextseries in ns){ # for each row
  print(nextseries) # print the row it's on
  con <- MySQLConnect()
  d1 <- dbGetQuery(con, paste0("SELECT * FROM tablename WHERE export_date = '",day,"' LIMIT ", nextseries,",",numofids)) # extract data in chunks of 50k rows
  dbDisconnect(con)
  # bind data to tosave dataframe. (the ifelse is avoid an error when it tries to rbind d1 to an empty dataframe on the first pass).
  if(nrow(tosave)>0){
      tosave <- rbind(tosave, d1)
  }else{
      tosave <- d1
  }
}
Optimus
  • 1,354
  • 1
  • 21
  • 40
  • Great answer! It worked for me even better when I added `Sys.sleep(2)` in the `for` loop. One thing though; the part `LIMIT ", nextseries,",",numofids))` should be `LIMIT ", nextseries - 1,",",numofids))` or else you're missing the first row of the total dataset. – MS Berends Apr 20 '17 at 07:54