0

I need to read a 4.5GB csv file into RStudio, and to overcome the memory issue I use the read.ffdf function from the ff package. However, I still get an error message that the data is too big

Error: cannot allocate vector of size 6607642.0 Gb

and I can't figure out why. I would really appreciate any help!

options(fftempdir="C:/Users/Documents/")

CRSPDailyff <- read.csv.ffdf(file="CRSP_Daily_Stock_Returns_1995-2015.csv")
Md.Sukel Ali
  • 2,987
  • 5
  • 22
  • 34
JL1118
  • 61
  • 3
  • Without a reproducible example is hard to help you. At any rate, try to use: fread() from data.table package. It will give you an object of class "data.table", which is very similar to data.frame though quirky sometimes. You can easily convert it to your familiar data.frame by using as.data.frame(x) or by using data.table=FALSE in the argument of fread(). Alternatively, I suggest you this post: https://rstudio-pubs-static.s3.amazonaws.com/72295_692737b667614d369bd87cb0f51c9a4b.html – Scipione Sarlo May 05 '19 at 08:28
  • Have you tried to load the data using small chunks and using rbind to join each one? – Jandisson May 05 '19 at 09:20
  • Maybe you could try `data.table` package – zhaoxg May 05 '19 at 09:55
  • Try the new vroom package that just appeared on CRAN yesterday. – G. Grothendieck May 05 '19 at 11:36

1 Answers1

0

I suspect you might able to overcome this limitation using the next.rows argument.

Please try:

options(fftempdir="C:/Users/Documents/")

CRSPDailyff <- 
read.csv.ffdf(file="CRSP_Daily_Stock_Returns_1995-2015.csv", next.rows = 100000)

Experiment with other values for next.rows, I personally use 500000 on a 4GB machine here on campus.

The advice from other commenters to use

MAIAkoVSky
  • 151
  • 1
  • 3