1

I am trying to extract find extract datetimes following arbitrary file pattern formats. Some of these patterns include repeated date format elements such as %Y%M%d.

datetime.datetime.strptime is usually very handy for this but its underlying regex implementation precludes the use of repeated date format elements.

For example, running this code:

import datetime

filepath = '/backups/20190905/data-20190905-230001.tgz'
filepattern = '/backups/%Y%m%d/data-%Y%m%d-%H%M%S.tgz'

backup_time_stamp = datetime.datetime.strptime(filepath, filepattern)

Produces the following error:

Traceback (most recent call last):
  File "strp.py", line 11, in <module>
    backup_time_stamp = datetime.datetime.strptime(filepath, filepattern)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_strptime.py", line 565, in _strptime_datetime
    tt, fraction = _strptime(data_string, format)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_strptime.py", line 345, in _strptime
    format_regex = _TimeRE_cache.compile(format)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_strptime.py", line 275, in compile
    return re_compile(self.pattern(format), IGNORECASE)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/re.py", line 233, in compile
    return _compile(pattern, flags)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/sre_parse.py", line 856, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, False)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/sre_parse.py", line 415, in _parse_sub
    itemsappend(_parse(source, state, verbose))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/sre_parse.py", line 757, in _parse
    raise source.error(err.msg, len(name) + 1) from None
sre_constants.error: redefinition of group name 'Y' as group 4; was group 1 at position 101

This is a documented limitation of datetime.datetime.strptime. I am wondering what a possible workaround would be.

ChrisGuest
  • 3,398
  • 4
  • 32
  • 53

1 Answers1

1

I would suggest to split the filepath to extract datetimes only from the file name

import datetime

filepath = '/backups/20190905/data-20190905-230001.tgz'
filename = filepath.split('/')[-1]
filepattern = 'data-%Y%m%d-%H%M%S.tgz'

backup_time_stamp = datetime.datetime.strptime(filename, filepattern)

An alternative is to use library datetime-glob developed by marko to parses date/time from paths using glob wildcard pattern intertwined with date/time format.

henrywongkk
  • 1,840
  • 3
  • 17
  • 26
  • ```matcher = datetime_glob.Matcher(pattern='/backups/%Y%m%d/data-%Y%m%d-%H%M%S.tgz') ; match = matcher.match(path='/backups/20190905/data-20190905-230001.tgz') ; match.as_datetime()``` – ChrisGuest Oct 15 '19 at 22:54