1

I have a csv file with a datetime field called "Reading Time" that looks like the following "2020-09-01 00:06:52 +0000 UTC".

When using any of the following to_datetime function in Pandas, I get the following error depending on the function arguments I use:

df['Reading Time'] =  pd.to_datetime(df['Reading Time'], format='%Y-%m-%d %H:%M:%S')
df['Reading Time'] =  pd.to_datetime(df['Reading Time'], format='%Y-%m-%d %H:%M:%S', exact=False)
df['Reading Time'] =  pd.to_datetime(df['Reading Time'], format='%Y-%m-%d %H:%M:%S %Z')
df['Reading Time'] =  pd.to_datetime(df['Reading Time'], format='%Y-%m-%d %H:%M:%S %Z', exact=False)

ValueError: time data 2020-09-01 00:06:52 +0000 UTC doesn't match format specified

If I then try the "coerce" argument...

df['Reading Time'] =  pd.to_datetime(df['Reading Time'], format='%Y-%m-%d %H:%M:%S %Z', errors='coerce')

...all the Reading Time values are returned as "NaT"

Do I need to strip out the "+0000 UTC" first before parsing?

Thanks in advance.

Atyeo
  • 21
  • 4
  • Do you have more examples of data in this format for other time zones? Does the `UTC` part **actually change** if the time is for a different time zone? – Karl Knechtel Feb 17 '23 at 02:27

2 Answers2

2

The issue may be the format you are providing.

You include %Z for the time zone name (UTC), but no %z for the UTC offset (+0000).

df['Reading Time'] =  pd.to_datetime(df['Reading Time'], format='%Y-%m-%d %H:%M:%S %z %Z')

Try that instead.

Here are the docs for the format parameter: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

AJD
  • 66
  • 6
  • That's for the suggestion @aj-davies. I tried the above and received the following error: ValueError: Cannot parse both %Z and %z – Atyeo Jan 25 '22 at 10:06
1

I ran into the same problem with Pandas not parsing both %z and %Z, so I used this regex to remove the timezone name from the end of the string:

strs = df['Reading Time'].apply(lambda x: re.sub(' \w+$','',x))
df['Reading Time'] = pd.to_datetime( strs )

The regex matches a space followed by any number of alpha letters up to the end of the string, so UTC, EST, etc. Then Pandas will automatically parse the format without you needing to specify.

Adam Payne
  • 11
  • 1