2

I need to uncompress a transactions.gz file downloaded from Kaggle; approximately (2.86 GB), 350 million rows, 11 columns.

I tried on RStudio, windows Vista, 32 bits, RAM: 3 GB:

transactions <- read.table(gzfile("E:/2014/Proyectos/Kaggle/transactions.gz"))
write.table(transactions, file="E:/2014/Proyectos/Kaggle/transactions.csv")

But i receive this error message on the console

> transactions <- read.table(gzfile("E:/2014/Proyectos/Kaggle/transactions.gz"))
Error: cannot allocate vector of size 64.0 Mb
> write.table(transactions, file="E:/2014/Proyectos/Kaggle/transactions.csv")
Error: cannot allocate vector of size 64.0 Mb

I checked this case, but it didn't work for me: Decompress gz file using R

I would appreciate any suggestions.

Community
  • 1
  • 1
user3591356
  • 95
  • 2
  • 9

1 Answers1

4

This file decompresses to a 22GB .csv file. You can't process it all at once in R on your 6GB machine because R needs to read everything into memory. It would be best to process it in an RDBMS like postgresql. If you are intent on using R you could process it in chunks, reading a manageable number of rows at a time: read a chunk, process it, and then overwrite with the next chunk. For this data.table::fread would be better than the standard read.table.

Oh, and don't decompress in R, just run gunzip from the command line and then process the csv. If you're on Windows you can use winzip or 7zip.

James King
  • 6,229
  • 3
  • 25
  • 40