How to read first 1000 lines of .csv file into R?

Question

I have very big .csv file, it's around a few GB.
I want to read first few thousand lines of it. Is there any method to do this efficiently?

http://stackoverflow.com/questions/3094866/trimming-a-huge-3-5-gb-csv-file-to-read-into-r?rq=1 — Francisco Corrales Morales, Jan 19 '14 at 20:18
I came to this question repeatedly when looking how to solve the same issue. I'd like to see solutions in readr read.csv etc. And from the number of hits, upvotes and favourites think it would be useful to reopen the question? — pluke, Sep 15 '17 at 08:50
This is pretty valid question. Don't really understand why it is "too broad". Do we really need repex to write big csv just to have something to deal with? The nature of problem of reading just part of the file is broad, not the question. — jangorecki, May 09 '18 at 10:37
I see no problem with this question at all. It's perfectly fine. — gruvn, Dec 02 '22 at 17:18

score 103 · Answer 1 · edited Jan 19 '14 at 21:11

103

Use the nrows argument in read.csv(...)

df <- read.csv(file="my.large.file.csv",nrows=2000)

There is also a skip= parameter that tells read.csv(...) how many lines to skip before you start reading.

If your file is that large you might be better off using fread(...) in the data.table package. Same arguments.

edited Jan 19 '14 at 21:11

Ben Bolker

answered Jan 19 '14 at 20:21

jlhoward

1

`skip` isn't very helpful if the first line is the row of column names. – Matthew Lundberg Jan 19 '14 at 20:29
3

@MatthewLundberg In that case you can `scan()` the first line with n=1, then use `read.csv` with `skip=` and add the colnames after that. – Ari B. Friedman Nov 19 '14 at 11:25
4

for `read_csv` (R 3.4.4, Win 7), the option is `n_max`, see docs (p. 6) https://cran.r-project.org/web/packages/readr/readr.pdf – Peter Feb 19 '19 at 13:29
3

Example: `read_csv(file="train.csv", n_max=2000)` – Peter Feb 19 '19 at 13:30

score 20 · Answer 2 · answered Jan 19 '14 at 20:23

20

If you're on UNIX or OS/X, you can use the command line:

head -n 1000 myfile.csv > myfile.head.csv

Then just read it in R like normal.

answered Jan 19 '14 at 20:23

Ari B. Friedman

2 Answers2