2

I'm using datetime.strptime to parse and obtain DateTime values from strings, in the form of %Y-%m-%dT%H:%M:%SZ but the data is dirty and sometimes doesn't have the time parameter, is sometimes received in yyyy/mm/dd format instead of yyyy-mm-dd format. I can think of hacky regex and try-catch ways to parse this and get what I need, but is there a clean way to use datetime.strptime and obtain the datetime in '%Y-%m-%dT%H:%M:%SZ' format with 00:00:00 or something as the default time if there is no time information?

Currently doing:

time = datetime.strptime(data['time'], '%Y-%m-%dT%H:%M:%SZ').replace(tzinfo=pytz.utc)

which throws an error if the data is in an unexpected format.

2 Answers2

4

Just catch the ValueError and try again with an augmented value.

fmt = '%Y-%m-%dT%H:%M:%SZ'

try:
    time = datetime.strptime(data['time'], fmt)
except ValueError:
    time = datetime.strptime(data['time'] + "T00:00:00Z", fmt)

Alternatively, try the same string with a date-only format, since the resulting value will already default to 00:00:00.

date_and_time = '%Y-%m-%dT%H:%M:%SZ'
date_only = '%Y-%m-%d'
try: 
    time = datetime.strptime(data['time'], date_and_time)
except ValueError:
    time = datetime.strptime(data['time'], date_only)

The second approach is a bit easier to adapt to multiple possible formats. Make a list, and iterate over them until one succeeds.

formats = ['%Y-%m-%dT%H:%M:%SZ', '%Y-%m-%d', ...]
for fmt in formats:
    try:
        time = datetime.strptime(data['time'], fmt)
        break
    except ValueError:
        pass
else:
    # raise ValueError(f'{data["time"]} does not match any expected format')
    time = datetime.now()  # Or some other completely artificial value
chepner
  • 497,756
  • 71
  • 530
  • 681
  • A very nice approche, specially the loop solution – Charif DZ Oct 14 '19 at 18:24
  • Yeah, I thought of something like this but was wondering if there might be a less "work-around/hacky" sort of a method, but it doesn't look like there are any.. Thank you for answering! :) – Hyuga Hinata Oct 14 '19 at 21:31
1

If you're okay with third-party dependencies, you may also try the dateutil library:

import dateutil.parser
time = parser.isoparse(data['time']).replace(tzinfo=pytz.utc)

Or, if you want to have more control over the default values:

import dateutil.parser
time = parser.parse(data['time'], default=datetime.datetime(2019, 10, 14, 20, 14, 50), yearfirst=True).replace(tzinfo=pytz.utc)

Both of them allow more missing fields in the date string (like YYYY or YYYY-MM, etc.). See https://dateutil.readthedocs.io/en/stable/parser.html for more details.

mhthies
  • 113
  • 9