How to load a date column from a CSV straight as datetime[ns] type into a Pandas DataFrame?

Asked Sep 03 '18 at 08:13

Active Sep 04 '18 at 01:55

Viewed 77 times

I have a huge CSV file with date columns like this:

a,b,c
10-15,2008-10-20,2008-10-21
1996-06-20,1996-06-21,1996-07-25
1998-06-22,1998-06-23,1998-06-23
.
.
.

I want to read thsi file into a Pandas DataFrame while storing the dates straight as datetime64[ns] type. So I tried

pd.read_csv(fname, dtype={
             'a': np.datetime64,
             'b': np.datetime64,
             'c': np.datetime64 })

but the Pandas parser complains.

I want to avoid using the parse_dates option or post-processing the DataFrame with astype, because the CSV has 50 million rows and each time I load the CSV the conversion takes a long time.

Is there a way to read the dates straight into datetime64[ns] types?

Update: As it turns out, reading the CSV with the parse_dates option (as suggested in the answers of the proposed duplicate) is performance-wise *not too bad**: On my machine reading the 50 million records takes

without conversion to date objects 2:30 min
with conversion to date objects 5:50 min

edited Sep 04 '18 at 01:55

asked Sep 03 '18 at 08:13

halloleo

9,216
13
64
122

what error are you getting? – Alessandro Sep 03 '18 at 10:43

How to load a date column from a CSV straight as datetime[ns] type into a Pandas DataFrame?

0 Answers0