I have some code see EOM; it's by no means final but is the best way (so far) I've seen/conceived for validating multiple date formats in a somewhat performant way.
I'm wondering if there is a means to pass an additional argument to this kind of function (_normalise_coerce), it would be nice if the date format string could be defined in the schema. something like
{
"a_date":{
"type": "datetime",
"coerce": "to_datetime",
"coerce_args": "%m/%d/%Y %H:%M"
}
}
Vs making a code change in the function to support an additional date format. I've looked through the docs and not found anything striking. Fairly good chance I'm looking at this all wrong but figured asking the experts was the best approach. I think defining within the schema is the cleanest solution to the problem, but I'm all eyes and ears for facts, thoughts and opinions.
Some context:
- Performance is essential as this could be running against millions of rows in AWS lambdas (and Cerbie (my nickname for cerberus) isn't exactly a spring chicken :P ).
- None of the schemas will be native python dicts as they're all defined in JSON/YAML, so it all needs to be string friendly.
- Not using the built-in coercion as the python types cannot be parsed from strings
- I don't need the datetime object, so regex is a possibility, just less explicit and less futureproof.
- If this is all wrong and I'm grossly incompetent, please be gentle (づ。◕‿‿◕。)づ
def _normalize_coerce_to_datetime(self, value: Union(str, datetime, None)) -> Union(datetime, str, None):
'''
Casts valid datetime strings to the datetime python type.
:param value: (str, datetime, None): python datetime, datetime string
:return: datetime, string, None. python datetime,
invalid datetime string or None if the value is empty or None
'''
datetime_formats = ['%m/%d/%Y %H:%M']
if isinstance(value, datetime):
return value
if value and not value.isspace():
for format in datetime_formats:
try:
return datetime.strptime(value, format)
except ValueError:
date_time = value
return date_time
else:
return None