I am reading CSV file (10 GB) using Dask. Then after performing some operations, I am Exporting file in CSV format using to_csv
. But problem is that exporting this file is taking around 27 Minutes (According to ProgressBar Diagnostics).
CSV file includes 350 columns with one column of timestamp and other column's datatype are set to float64
.
- Machine Specs:
- Intel i7-4610M @ 3.00 GHz
- 8 GB DDR3 RAM
- 500 GB SSD
- Windows 10 Pro
I have tried exporting in separate files like to_csv('filename-*.csv')
and also have tried without including .csv
. So, Dask exports file with extension of .part
. But doing this also takes same time as mentioned above.
I think this should not be an issue with I/O operations as I am using SSD. But I am not sure about that.
Here is my code (simplified):
df = dd.read_csv('path\\to\\csv')
# Doing some operations using df.loc
df.to_csv('export.csv', single_file=True)
I am using Dask v2.6.0.
Expected output --> complete this process in less time without changing specs of machine.
Is there anyway, I can export this file in less time?