What are objective benefits and drawbacks of read.csv() versus read_csv()?

Question

This is a more conceptual question, but in class today, I was told by my professor that it would be preferable to use read_csv rather than read.csv. For more context, we are working with tidyverse in this class.

As such, since read_csv and read.csv (as far as I'm aware) both read CSV files, what are the objective benefits and drawbacks of using one function versus the other?

There are multiple criteria and they conflict. These includes minimizing dependencies, consistency with rest of program (does it use base or tidyverse) and performance. — G. Grothendieck, Oct 03 '18 at 01:05
This question is being [discussed on meta](https://meta.stackoverflow.com/questions/418412) — cigien, May 30 '22 at 18:46

score 12 · Accepted Answer · edited Sep 29 '20 at 12:50

12

read_csv is significantly faster for large .csv files. See here for more information. Personally, I pretty much always use read_csv by default.

edited Sep 29 '20 at 12:50

Nuclear03020704

549
9
22

answered Oct 02 '18 at 22:21

jalind

491
1
5
11

1

I would cast some doubt on those blog results. They conflict with the official tidyverse / readr site for starters - "*[readr is] slower (currently ~1.2-2x slower. If you want absolutely the best performance, use data.table::fread().*" - https://readr.tidyverse.org/ – thelatemail Oct 03 '18 at 01:35
5

@thelatemail You're correct; the cited blog is now two years old and at the very least is out of date. `fread` (as of version 1.11) automatically uses parallel processing. In every experiment I've run, on a variety of file sizes,`read.csv` is slowest, `read_csv` is 2-3x faster, and `fread` is 2-3x faster again. – Robert McDonald Dec 30 '18 at 19:36

What are objective benefits and drawbacks of read.csv() versus read_csv()?

1 Answers1