I have very big .csv file, it's around a few GB.
I want to read first few thousand lines of it.
Is there any method to do this efficiently?
Asked
Active
Viewed 1.1e+01k times
68

jangorecki
- 16,384
- 4
- 79
- 160

user2806363
- 2,513
- 8
- 29
- 48
-
http://stackoverflow.com/questions/3094866/trimming-a-huge-3-5-gb-csv-file-to-read-into-r?rq=1 – Francisco Corrales Morales Jan 19 '14 at 20:18
-
3I came to this question repeatedly when looking how to solve the same issue. I'd like to see solutions in readr read.csv etc. And from the number of hits, upvotes and favourites think it would be useful to reopen the question? – pluke Sep 15 '17 at 08:50
-
2This is pretty valid question. Don't really understand why it is "too broad". Do we really need repex to write big csv just to have something to deal with? The nature of problem of reading just part of the file is broad, not the question. – jangorecki May 09 '18 at 10:37
-
1Check out argument `nrows` in `help("read.csv")`. – Rui Barradas May 09 '18 at 17:50
-
I see no problem with this question at all. It's perfectly fine. – gruvn Dec 02 '22 at 17:18
2 Answers
103
Use the nrows
argument in read.csv(...)
df <- read.csv(file="my.large.file.csv",nrows=2000)
There is also a skip=
parameter that tells read.csv(...)
how many lines to skip before you start reading.
If your file is that large you might be better off using fread(...)
in the data.table package. Same arguments.

Ben Bolker
- 211,554
- 25
- 370
- 453

jlhoward
- 58,004
- 7
- 97
- 140
-
1`skip` isn't very helpful if the first line is the row of column names. – Matthew Lundberg Jan 19 '14 at 20:29
-
3@MatthewLundberg In that case you can `scan()` the first line with n=1, then use `read.csv` with `skip=` and add the colnames after that. – Ari B. Friedman Nov 19 '14 at 11:25
-
4for `read_csv` (R 3.4.4, Win 7), the option is `n_max`, see docs (p. 6) https://cran.r-project.org/web/packages/readr/readr.pdf – Peter Feb 19 '19 at 13:29
-
3
20
If you're on UNIX or OS/X, you can use the command line:
head -n 1000 myfile.csv > myfile.head.csv
Then just read it in R like normal.

Ari B. Friedman
- 71,271
- 35
- 175
- 235