1

I have a huge CSV file with date columns like this:

a,b,c
10-15,2008-10-20,2008-10-21
1996-06-20,1996-06-21,1996-07-25
1998-06-22,1998-06-23,1998-06-23
.
.
.

I want to read thsi file into a Pandas DataFrame while storing the dates straight as datetime64[ns] type. So I tried

pd.read_csv(fname, dtype={
             'a': np.datetime64,
             'b': np.datetime64,
             'c': np.datetime64 })

but the Pandas parser complains.

I want to avoid using the parse_dates option or post-processing the DataFrame with astype, because the CSV has 50 million rows and each time I load the CSV the conversion takes a long time.

Is there a way to read the dates straight into datetime64[ns] types?


Update: As it turns out, reading the CSV with the parse_dates option (as suggested in the answers of the proposed duplicate) is performance-wise *not too bad**: On my machine reading the 50 million records takes

  • without conversion to date objects 2:30 min
  • with conversion to date objects 5:50 min
halloleo
  • 9,216
  • 13
  • 64
  • 122

0 Answers0