0

I'd like to define a Hypothesis strategy do generate a random datetime format.

Examples of what I'd like it to return:

  • '%Y-%m-%d'
  • '%d/%m/%Y %H:%M:%S'
  • '%m %d %Y %H:%M'
  • '%Y-%W-%a'

and so on.

Doesn't look like there's anything in hypothesis already.


Note: I'm looking for formats which would work with datetime.strptime. So, ideally, if this strategy returns a format fmt, then:

import datetime as dt

dt.datetime.strptime(dt.datetime.now().strftime(fmt), fmt)

shouldn't error

ignoring_gravity
  • 6,677
  • 4
  • 32
  • 65
  • Isn't there an infinite number of different datetime formats? I'm not familiar with Hypothesis but I can say `'%a%a%a%a%a'` is a valid format, no? – doneforaiur Jul 01 '23 at 07:29
  • The issue with that one is that `strptime` would fail with it: `datetime.strptime('WedWedWedWed', '%a%a%a%a')` throws an error. I'll clarify, thanks – ignoring_gravity Jul 01 '23 at 07:35
  • There's a `TimeRe` class in `_strptime.py`. If Hypotesis route leads you nowhere, you can use the regular expressions to generate a valid datetime format. I'm really curious about this so I'll stick around. :^) – doneforaiur Jul 01 '23 at 07:52

1 Answers1

0

I don't think anyone's asked for this before, so let's write a new strategy!

We're trying to generate a string, which consists of a sequence of zero (!) or more non-unique (!) format codes, optionally separated by arbitrary literals. If a literal contains %, it must be escaped by doubling to %%. So something like...

from hypothesis import strategies as st

# See https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
# perhaps we can get a full list programmatically somehow?
FORMAT_CODES = ("%Y", "%m", "%d")  # etc.


def datetime_format_strings(
    *,
    literals: st.SearchStrategy[str] = st.text(),
    min_codes: int = 0,
    max_codes: int | None = None,
    unique: bool = True,  # False is OK for strftime, but not strptime
) -> st.SearchStrategy[str]:
    """Generate format strings for strftime and strptime."""
    return st.lists(
        st.tuples(
            st.sampled_from(FORMAT_CODES),
            literals.map(lambda x: x.replace(r"%", r"%%")),
        ),
        min_size=min_codes,
        max_size=max_codes,
        unique_by=(lambda p: p[0]) if unique else None,
    ).map(lambda xs: "".join(sum(xs, start=())[:-1]))


if __name__ == "__main__":
    for _ in range(10):
        print(datetime_format_strings().example())

Note that this assumes we wish to both start and end with a format string; it's trivial to adjust that on either side by changing the order of the tuples and how we chop the flattened sequence.

Conceptually, the trick here is to recognise that if we want a sequence of something the st.lists() strategy is going to be our favored building block. It's then quite efficient to mess around with the formatting and even throw some small pieces away, so long as Hypothesis' engine can operate on it in terms of a list of elements.

Zac Hatfield-Dodds
  • 2,455
  • 6
  • 19