11

I am working with an Oracle database and like to fetch a table with 30 million records.

library(RODBC)
ch <- odbcConnect("test", uid="test_user", 
                  pwd="test_pwd", 
                  believeNRows=FALSE, readOnly=TRUE)
db <- sqlFetch(ch, "test_table")

For 1 million records the process needs 1074.58 sec. Thus, it takes quite a while for all 30 million records. Is there any possiblity to speed up the process?

I would appreciate any help. Thanks.

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
majom
  • 7,863
  • 7
  • 55
  • 88
  • 2
    RODBC against Oracle is crazy slow. Dump to a local csv at the database end, transfer over network file sharing and then load into R via sqldf::read.csv.sql. See: http://www.cerebralmastication.com/2009/11/loading-big-data-into-r/ – Hansi Nov 08 '12 at 14:24
  • Thanks for the link. Do you have any experience whether the "ROracle" package is better in terms of performance? To me, it really seems counterintuitive to create a dump. – majom Nov 08 '12 at 14:35
  • @majom At work, I have to write queries against an Oracle Data warehouse and pulling the data into R is painfully slow. I have never been able to get ROracle up and running. Instead of sqlFetch, I try to write targeted queries to get at the data I need. This minimizes the data transfer (a bit) and leverages the Oracle in-database speed. I don't work with data as large as yours, so my process fits my needs (for now). Not ideal, but works. – Btibert3 Nov 08 '12 at 15:14
  • Thanks, Btibert3. I just came across the following performance comparison between ROracle and RJDBC http://guyharrison.squarespace.com/blog/2011/5/20/rjdbc-20-performance-improvements.html. Probably, I have really to try ROracle, even though I am not sure if RODBC and RJDBC are 100% comparable in terms of performance. – majom Nov 08 '12 at 23:36

1 Answers1

0

You could try making a system call through the R terminal to a mySQL shell using the system() command. Process your data externally and only load what you need as output.

JClarke09
  • 141
  • 2
  • 9