I have a huge CSV file with date columns like this:
a,b,c
10-15,2008-10-20,2008-10-21
1996-06-20,1996-06-21,1996-07-25
1998-06-22,1998-06-23,1998-06-23
.
.
.
I want to read thsi file into a Pandas DataFrame while storing the dates straight as datetime64[ns]
type. So I tried
pd.read_csv(fname, dtype={
'a': np.datetime64,
'b': np.datetime64,
'c': np.datetime64 })
but the Pandas parser complains.
I want to avoid using the parse_dates
option or post-processing the DataFrame with astype
, because the CSV has 50 million rows and each time I load the CSV the conversion takes a long time.
Is there a way to read the dates straight into datetime64[ns]
types?
Update: As it turns out, reading the CSV with the parse_dates
option (as suggested in the answers of the proposed duplicate) is performance-wise *not too bad**: On my machine reading the 50 million records takes
- without conversion to date objects 2:30 min
- with conversion to date objects 5:50 min