-3

I have a table in a mysql db that contains 312 millions rows and 10 fields/columns. I am trying to import this data into R, using library("RMySQL") but this kills my desktop (config 8 GB RAM 64 bit AMD Dual core processor running Linux Mint 15)Is there an efficient way I can import all this data into R ? Like using "ff" library etc?

joran
  • 169,992
  • 32
  • 429
  • 468
user3006691
  • 435
  • 3
  • 7
  • 16
  • Use read.dbi.ffdf from package ETLUtils. It exactly does this, get all your data from an SQL query in an ffdf (so stored on disk, not in RAM). As an example with MySQL, follow [this link](http://bnosac.be/index.php/blog/21-readodbcffdf-a-readdbiffdf-for-fetching-large-corporate-sql-data). If you have tried it out, you might try to change your question into a programming question - what stackoverflow is for. –  Nov 19 '13 at 17:32

1 Answers1

3

It would not fit in your RAM without using some "bigdata" package. Quick demonstration with a matrix of 1 million rows and 10 colums:

> m <- matrix(runif(1e7), 1e6, 10)
> object.size(m) / 1024 / 1024
76.2941360473633 bytes

So it takes around 76 Mb of RAM. Your 312 million rows would be something like:

> object.size(m) / 1024 / 1024 * 312
23803.7704467773 bytes

So around 24 Gb of RAM.

Do you really need to have all this data amount in RAM? Why not subset or aggregate inside of MySQL and pass to R what is really needed? Give a try to dplyr.

daroczig
  • 28,004
  • 7
  • 90
  • 124