2

How do you convert a column of dates of the form "2020-06-30 15:20:13.078196+00:00" to datetime in pandas?

This is what I have done:

  1. pd.concat([df, df.date_string.apply(lambda s: pd.Series({'date':datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f%z')}))], axis=1)
  2. pd.concat([df, df.file_created.apply(lambda s: pd.Series({'date':datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f.%z')}))], axis=1)
  3. pd.concat([df, df.file_created.apply(lambda s: pd.Series({'date':datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f:%z')}))], axis=1)

I get the error - time data '2020-06-30 15:20:13.078196+00:00' does not match format in all cases. Any help is appreciated.

Nilima
  • 197
  • 1
  • 2
  • 9
  • 2
    Your format is looking for a literal character `T` that separates the date from time. You should either replace that with a space, or your input data isn't consistent with using `T`. – chepner Oct 09 '21 at 12:27
  • I removed the T, still getting the same error - `time data '2020-06-30 15:20:13.078196+00:00' does not match format` – Nilima Oct 11 '21 at 14:59
  • Did you remove the T, or replace it with a space? – chepner Oct 11 '21 at 15:12

2 Answers2

4

None of the formats mentioned by you above matches your sample. Try this

"%Y-%m-%d %H:%M:%S.%f%z" (Notice the space before %H).

Puneeth R
  • 95
  • 5
4

+00:00 is a UTC offset of zero hours, thus can (should) be interpreted as UTC. The easiest thing to do is let pd.to_datetime auto-infer the format. That works very well for standard formats like this (ISO 8601):

import pandas as pd

dti = pd.to_datetime(["2020-06-30 15:20:13.078196+00:00"])

print(dti)
# DatetimeIndex(['2020-06-30 15:20:13.078196+00:00'], dtype='datetime64[ns, UTC]', freq=None)

Notes

  • pd.to_datetime also works very well for mixed formats, see e.g. here.
  • setting infer_datetime_format=True can give some better performance if the date/time strings in the column you're converting only have one format (all strings formatted the same way)
  • pandas v2: you can set `format="ISO8601" to help with performance
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
  • 1
    Thank you so much! This works. For others who come searching - this is what I did: `df['date'] = pd.to_dataframe(df.string_date, infer_datetime_format = True)` – Nilima Oct 11 '21 at 10:10
  • 1
    @Nilima glad I could help; setting `infer_datetime_format=True` can give some better performance if the date/time strings in the column you're converting only have one format (all strings formatted the same way). – FObersteiner Oct 11 '21 at 10:30