R - Read large file with small memory

Asked Mar 27 '18 at 13:23

Active May 10 '19 at 06:36

Viewed 717 times

My data is organize in an csv file with millions of lines and several columns. This file is to large to read into memory all at once.

Fortunately, I only want to compute some statistics on it, like the mean of each column at every 100 rows and such. My solution, based on other posts where was to use read.csv2 with options nrow and skip. This works.

However, I realized that when loading from the end of the file this process is quite slow. As far as I can tell, the reader seems to go trough the file until it passes all the lines that I say to skip and then reads. This, of course, is sub optimal, as it keeps reading over the initial lines every time.

Is there a solution, like python parser, where we can read the file line by line, stop when needed, and then continue? And keeping the nice reading simplicity that comes from read.csv2?

asked Mar 27 '18 at 13:23

Diogo Santos

1

Take a look at `readLines` – G5W Mar 27 '18 at 13:26
I don't know how `read.csv` is implemented, but it might require sequential access of the input file. – Tim Biegeleisen Mar 27 '18 at 13:26
Possible duplicate of [Still struggling with handling large data set](https://stackoverflow.com/questions/45362126/still-struggling-with-handling-large-data-set) – F. Privé Mar 27 '18 at 14:00
2

`read.csv` is very slow. It's not suitable for large files. I usually use `data.table::fread`. – Roland Mar 27 '18 at 14:04

R - Read large file with small memory

0 Answers0