I have list of strings representing datetimes in different formats. I.e.:
list_date_str = ['2021010112', '202101011210']
The first should translate to 2021-01-01 12:00, the second to 2021-01-01 12:10. Without giving much thought to it I wrote this snippet:
import datetime as dt
for date_str in list_date_str:
try:
date = dt.datetime.strptime(date_str, '%Y%m%d%H%M')
except ValueError:
date = dt.datetime.strptime(date_str, '%Y%m%d%H')
print(date)
After a painstaking bug search I realized that the first string is not parsed as expected. The code gives:
2021-01-01 01:02:00
2021-01-01 12:10:00
I do understand what is happening: The except-block is never reached. Instead the penultimate character of '2021010112' is interpreted as the hour-digit and the last character is interpreted as the minute-digit.
Is this supposed behavior? The datetime doc clearly states that %H means a zero-padded decimal number as well as %M.
Am I not getting it or is the doc just misleading? Why does the try-block not raise a ValueError?
Is there a convenient and robust way to tackle this issue? I know that in this particular case the code can be fixed by exchanging the try- and the expect-block. But this can't be the right way to do it.
PS: This issue also applies to pd.to_datetime.