68

I have very big .csv file, it's around a few GB.
I want to read first few thousand lines of it. Is there any method to do this efficiently?

jangorecki
  • 16,384
  • 4
  • 79
  • 160
user2806363
  • 2,513
  • 8
  • 29
  • 48
  • http://stackoverflow.com/questions/3094866/trimming-a-huge-3-5-gb-csv-file-to-read-into-r?rq=1 – Francisco Corrales Morales Jan 19 '14 at 20:18
  • 3
    I came to this question repeatedly when looking how to solve the same issue. I'd like to see solutions in readr read.csv etc. And from the number of hits, upvotes and favourites think it would be useful to reopen the question? – pluke Sep 15 '17 at 08:50
  • 2
    This is pretty valid question. Don't really understand why it is "too broad". Do we really need repex to write big csv just to have something to deal with? The nature of problem of reading just part of the file is broad, not the question. – jangorecki May 09 '18 at 10:37
  • 1
    Check out argument `nrows` in `help("read.csv")`. – Rui Barradas May 09 '18 at 17:50
  • I see no problem with this question at all. It's perfectly fine. – gruvn Dec 02 '22 at 17:18

2 Answers2

103

Use the nrows argument in read.csv(...)

df <- read.csv(file="my.large.file.csv",nrows=2000)

There is also a skip= parameter that tells read.csv(...) how many lines to skip before you start reading.

If your file is that large you might be better off using fread(...) in the data.table package. Same arguments.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
jlhoward
  • 58,004
  • 7
  • 97
  • 140
20

If you're on UNIX or OS/X, you can use the command line:

head -n 1000 myfile.csv > myfile.head.csv

Then just read it in R like normal.

Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235