2

I have list of strings representing datetimes in different formats. I.e.:

list_date_str = ['2021010112', '202101011210']

The first should translate to 2021-01-01 12:00, the second to 2021-01-01 12:10. Without giving much thought to it I wrote this snippet:

import datetime as dt

for date_str in list_date_str:
    try:
        date = dt.datetime.strptime(date_str, '%Y%m%d%H%M')
    except ValueError:
        date = dt.datetime.strptime(date_str, '%Y%m%d%H') 
    print(date)

After a painstaking bug search I realized that the first string is not parsed as expected. The code gives:

2021-01-01 01:02:00
2021-01-01 12:10:00

I do understand what is happening: The except-block is never reached. Instead the penultimate character of '2021010112' is interpreted as the hour-digit and the last character is interpreted as the minute-digit.

Is this supposed behavior? The datetime doc clearly states that %H means a zero-padded decimal number as well as %M.

Am I not getting it or is the doc just misleading? Why does the try-block not raise a ValueError?

Is there a convenient and robust way to tackle this issue? I know that in this particular case the code can be fixed by exchanging the try- and the expect-block. But this can't be the right way to do it.

PS: This issue also applies to pd.to_datetime.

Durtal
  • 1,063
  • 3
  • 11

3 Answers3

1

Using len to get the string length and get time format from dict.

Ex:

import datetime
list_date_str = ['2021010112', '202101011210']

frmt = {10: '%Y%m%d%H', 12: '%Y%m%d%H%M'}
for date_str in list_date_str:
    try:
        print(datetime.datetime.strptime(date_str, frmt.get(len(date_str))))
    except:
        raise Exception("Date Format Not Found.")
Rakesh
  • 81,458
  • 17
  • 76
  • 113
1

Perhaps the easiest way would be to zero pad your datetime strings when required:

list_date_str = ['2021010112', '202101011210']

for date_str in list_date_str:
    try:
        date = dt.datetime.strptime(f'{date_str:0<12}', '%Y%m%d%H%M')
    except ValueError:
        print(f'Failed to convert {date_str!r}')
        continue 
    print(date)

Here the fstring f'{date_str:0<12}' is used to zero pad the end of the string using a field width of 12. This also permits parsing of shorter strings that might have no time component at all:

>>> list_date_str = ['2021010112', '202101011210', 'baddate', '20210101', '2021']
>>> for date_str in list_date_str:
...     try:
...         date = dt.datetime.strptime(f'{date_str:0<12}', '%Y%m%d%H%M')
...     except ValueError:
...         print(f'Failed to convert {date_str!r}')
...         continue 
...     print(date)
... 
2021-01-01 12:00:00
2021-01-01 12:10:00
Failed to convert 'baddate'
2021-01-01 00:00:00
Failed to convert '2021'
mhawke
  • 84,695
  • 9
  • 117
  • 138
0

I suspect the documentation more accurately reflects string formatting, rather than string parsing.

In your case, the actual problem is that your data is inconsistently formatted. I would not rely on a parsing attempt failing to determine what format it should be parsed in. Instead, you should explicitly check e.g. the length of your string to decide what format you want to use for parsing it. This also allows you to gracefully handle more than just the two cases you described here.

Johan
  • 477
  • 4
  • 8