dateutils default to the last occurence of recognized part, not next

Question

I am using dateutils.parser.parse to parse date strings which might contain partial information. If some information is not present, parse can take a default keyword argument from which it will fill any missing fields. This default defaults to datetime.datetime.today().

For a case like dateutil.parser.parse("Thursday"), this means it will return the date of the next Thursday. However, I need it to return the date of the last Thursday (including today, if today happens to be a Thursday).

So, assuming today == datetime.datetime(2018, 2, 20) (a Tuesday), I would like to get all of these asserts to be true:

from dateutil import parser
from datetime import datetime

def parse(date_str, default=None):
    # this needs to be modified
    return parser.parse(date_str, default=default)

today = datetime(2018, 2, 20)

assert parse("Tuesday", default=today) == today    # True
assert parse("Thursday", default=today) == datetime(2018, 2, 15)    # False
assert parse("Jan 31", default=today) == datetime(2018, 1, 31)    # True
assert parse("December 10", default=today) == datetime(2017, 12, 10)    # False

Is there an easy way to achieve this? With the current parse function only the first and third assert would pass.

It doesn't default to last or next, it just replaces the components of the default with the ones it finds in the string — Paul, Feb 20 '18 at 11:54
Compare if that weekday(date) have passed or not, if not passed, minus a weekday(a year). — Page David, Feb 20 '18 at 11:56
@Paul Hm, `dateutil.parser.parse("Thursday", default=datetime.datetime.(2018, 12, 31) == datetime.datetime.(2019, 1, 3)`, though. — Graipher, Feb 20 '18 at 11:57
@Graipher Thursday with 2018/2/20 was parsed to 2018/2/22, 2018/12/31 was parsed to 2019/1/3. Both of the results are the following day Thursday of the given day, is there anything looks strange? — Page David, Feb 20 '18 at 12:08
@DavidPage Well, that is just the behavior of `datutil.parser.parse`, but this is not the one I need. I would need them to parse to 2018/02/15 and 2018/12/27. — Graipher, Feb 20 '18 at 12:40
@VikasDamodar I want `"Thursday"` to parse to the date of the last Thursday (with respect to some reference), `"Dec 10"` to be the last day with that date (regardless if it was this or last year) and so on. — Graipher, Feb 20 '18 at 13:06

CristiFati · Accepted Answer · 2019-05-01T05:28:59.013

Here's your modified code (code.py):

#!/usr/bin/env python3

import sys
from dateutil import parser
from datetime import datetime, timedelta


today = datetime(2018, 2, 20)

data = [
    ("Tuesday", today, today),
    ("Thursday", datetime(2018, 2, 15), today),
    ("Jan 31", datetime(2018, 1, 31), today),
    ("December 10", datetime(2017, 12, 10), today),
]


def parse(date_str, default=None):
    # this needs to be modified
    return parser.parse(date_str, default=default)


def _days_in_year(year):
    try:
        datetime(year, 2, 29)
    except ValueError:
        return 365
    return 366


def parse2(date_str, default=None):
    dt = parser.parse(date_str, default=default)
    if default is not None:
        weekday_strs = [day_str.lower() for day_tuple in parser.parserinfo.WEEKDAYS for day_str in day_tuple]
        if date_str.lower() in weekday_strs:
            if dt.weekday() > default.weekday():
                dt -= timedelta(days=7)
        else:
            if (dt.month > today.month) or ((dt.month == today.month) and (dt.day > today.day)):
                dt -= timedelta(days=_days_in_year(dt.year))
    return dt


def print_stats(parse_func):
    print("\nPrinting stats for \"{:s}\"".format(parse_func.__name__))
    for triple in data:
        d = parse_func(triple[0], default=triple[2])
        print("  [{:s}] [{:s}] [{:s}] [{:s}]".format(triple[0], str(d), str(triple[1]), "True" if d == triple[1] else "False"))


if __name__ == "__main__":
    print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
    print_stats(parse)
    print_stats(parse2)

Notes:

I changed the structure of the code "a bit", to parametrize it, so if a change is needed (e.g. a new example to be added) the changes should be minimal
- Instead of asserts, I added a function (print_stats) that prints the results (instead raising AssertError and exiting the program if things don't match)
  - Takes an argument (parse_func) which is a function that does the parsing (e.g. parse)
  - Uses some globally declared data (data) together with the (above) function
- data - is a list of triples, where each triple contains:
  1. Text to be converted
  2. Expected datetime ([Python 3.Docs]: datetime Objects) to be yielded by the conversion
  3. default argument to be passed to the parsing function (parse_func)
parse2 function (an improved version of parse):
- Accepts 2 types of date strings:
  1. Weekday name
  2. Month / Day (unordered)
- Does the regular parsing, and if the converted object comes after the one passed as the default argument (that is determined by comparing the appropriate attributes of the 2 objects), it subtracts a period (take a look at [Python 3.Docs]: timedelta Objects):
  1. "Thursday" comes after "Tuesday", so it subtracts the number of days in a week (7)
  2. "December 10" comes after "February 20", so it subtracts the number of days in the year^*
- weekday_strs: I'd better explain it by example:
```
>>> parser.parserinfo.WEEKDAYS
[('Mon', 'Monday'), ('Tue', 'Tuesday'), ('Wed', 'Wednesday'), ('Thu', 'Thursday'), ('Fri', 'Friday'), ('Sat', 'Saturday'), ('Sun', 'Sunday')]
>>> [day_str.lower() for day_tuple in parser.parserinfo.WEEKDAYS for day_str in day_tuple]
['mon', 'monday', 'tue', 'tuesday', 'wed', 'wednesday', 'thu', 'thursday', 'fri', 'friday', 'sat', 'saturday', 'sun', 'sunday']
```
  - Flattens parser.parserinfo.WEEKDAYS
  - Converts strings to lowercase (for simplifying comparisons)

_days_in_year^* - as you probably guessed, returns the number of days in an year (couldn't simply subtract 365 because leap years might mess things up):

>>> dt = datetime(2018, 3, 1)
>>> dt
datetime.datetime(2018, 3, 1, 0, 0)
>>> dt - timedelta(365)
datetime.datetime(2017, 3, 1, 0, 0)
>>> dt = datetime(2016, 3, 1)
>>> dt
datetime.datetime(2016, 3, 1, 0, 0)
>>> dt - timedelta(365)
datetime.datetime(2015, 3, 2, 0, 0)

Output:

(py35x64_test) E:\Work\Dev\StackOverflow\q048884480>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32


Printing stats for "parse"
  [Tuesday] [2018-02-20 00:00:00] [2018-02-20 00:00:00] [True]
  [Thursday] [2018-02-22 00:00:00] [2018-02-15 00:00:00] [False]
  [Jan 31] [2018-01-31 00:00:00] [2018-01-31 00:00:00] [True]
  [December 10] [2018-12-10 00:00:00] [2017-12-10 00:00:00] [False]

Printing stats for "parse2"
  [Tuesday] [2018-02-20 00:00:00] [2018-02-20 00:00:00] [True]
  [Thursday] [2018-02-15 00:00:00] [2018-02-15 00:00:00] [True]
  [Jan 31] [2018-01-31 00:00:00] [2018-01-31 00:00:00] [True]
  [December 10] [2017-12-10 00:00:00] [2017-12-10 00:00:00] [True]

dateutils default to the last occurence of recognized part, not next

1 Answers1