I don't think anyone's asked for this before, so let's write a new strategy!
We're trying to generate a string, which consists of a sequence of zero (!) or more non-unique (!) format codes, optionally separated by arbitrary literals. If a literal contains %
, it must be escaped by doubling to %%
. So something like...
from hypothesis import strategies as st
# See https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
# perhaps we can get a full list programmatically somehow?
FORMAT_CODES = ("%Y", "%m", "%d") # etc.
def datetime_format_strings(
*,
literals: st.SearchStrategy[str] = st.text(),
min_codes: int = 0,
max_codes: int | None = None,
unique: bool = True, # False is OK for strftime, but not strptime
) -> st.SearchStrategy[str]:
"""Generate format strings for strftime and strptime."""
return st.lists(
st.tuples(
st.sampled_from(FORMAT_CODES),
literals.map(lambda x: x.replace(r"%", r"%%")),
),
min_size=min_codes,
max_size=max_codes,
unique_by=(lambda p: p[0]) if unique else None,
).map(lambda xs: "".join(sum(xs, start=())[:-1]))
if __name__ == "__main__":
for _ in range(10):
print(datetime_format_strings().example())
Note that this assumes we wish to both start and end with a format string; it's trivial to adjust that on either side by changing the order of the tuples and how we chop the flattened sequence.
Conceptually, the trick here is to recognise that if we want a sequence of something the st.lists()
strategy is going to be our favored building block. It's then quite efficient to mess around with the formatting and even throw some small pieces away, so long as Hypothesis' engine can operate on it in terms of a list of elements.