14

I am using Pandas to read CSV file data, but the CSV module is also there to manage the CSV file.

What is the difference between these both?

What are the cons of using Pandas over the CSV module?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Aarsh
  • 370
  • 1
  • 3
  • 13
  • 5
    `csv` is a module for parsing csv data... `pandas` really has *nothing* to do with csvs per se... Rather, it is a data analysis library for panel data, that provides a dataframe data structure... You shouldn't be using pandas *merely* to parse csvs... That's like swatting a fly with a sledgehammer. – juanpa.arrivillaga Jun 01 '20 at 19:13
  • 1
    this is an opinion-based question, and therefore off-topic for this site – Nicolas Gervais Jun 01 '20 at 19:59

5 Answers5

14

Based upon benchmarks

  • CSV is faster to load data for smaller datasets (< 1K rows)

  • Pandas is several times faster for larger datasets

Code to Generate Benchmarks

Benchmarks

CSV and Pandas Benchmarks

DarrylG
  • 16,732
  • 2
  • 17
  • 23
2
  1. 'csv' is a built-in module, but Pandas is not. If you want only reading CSV file, you should not install Pandas, because you must install it and increasing in dependencies of project is not a best practice.
  2. if you want to analyze data of a CSV file with Pandas, Pandas changes the CSV file to a dataframe needed for manipulating data with Pandas, and you should not use the 'csv' module for these cases.
  3. if you have a big data or data with large volume, you should consider libraries like NumPy and Pandas.
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
0

Pandas is better than CSV for managing data and doing operations on the data. CSV doesn't provide you with the scientific data manipulation tools that Pandas does.

If you are talking only about the part of reading the file, it depends. You may simply google both modules online, but generally I find it more comfortable to work with Pandas. It provides easier readability as well, since printing there is better too.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
snatchysquid
  • 1,283
  • 9
  • 24
0

I prefer Pandas since it's much faster for large CSV files. Also, the Pandas module has some functionalities which the CSV module doesn't.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
0

Since Pandas by default puts the full file into memory, reading a big greater than 6 GB CSV file can occasionally have memory-related performance difficulties.

You can use the 'csv' module in conjunction with Pandas to process the data in smaller parts in order to manage huge CSV files effectively. This strategy is memory-friendly and can help prevent issues with memory performance.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Tariq Ahmed
  • 471
  • 3
  • 14
  • Why 6 GB? Your own experience? What system was it tested on (incl. versions of Python and Pandas. Hardware, incl. size of physical and virtual memory. Operating system, incl. version and edition)? What kind of CSV file (how many lines? How many data items per line? What kind of data?)? Do you have a source? – Peter Mortensen Aug 19 '23 at 12:43