3

This is a more conceptual question, but in class today, I was told by my professor that it would be preferable to use read_csv rather than read.csv. For more context, we are working with tidyverse in this class.

As such, since read_csv and read.csv (as far as I'm aware) both read CSV files, what are the objective benefits and drawbacks of using one function versus the other?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Cameron
  • 2,805
  • 3
  • 31
  • 45

1 Answers1

12

read_csv is significantly faster for large .csv files. See here for more information. Personally, I pretty much always use read_csv by default.

Nuclear03020704
  • 549
  • 9
  • 22
jalind
  • 491
  • 1
  • 5
  • 11
  • 1
    I would cast some doubt on those blog results. They conflict with the official tidyverse / readr site for starters - "*[readr is] slower (currently ~1.2-2x slower. If you want absolutely the best performance, use data.table::fread().*" - https://readr.tidyverse.org/ – thelatemail Oct 03 '18 at 01:35
  • 5
    @thelatemail You're correct; the cited blog is now two years old and at the very least is out of date. `fread` (as of version 1.11) automatically uses parallel processing. In every experiment I've run, on a variety of file sizes,`read.csv` is slowest, `read_csv` is 2-3x faster, and `fread` is 2-3x faster again. – Robert McDonald Dec 30 '18 at 19:36