Simpler way to draw a list of a compound values with requirements of uniqueness and completeness?

Question

Given an object I'm trying to construct:

from dataclasses import dataclass    

@dataclass
class Thing:
    thing_id: str
    a: int
    b: float
    # other attributes

For property-based testing I need to generate lists of Thing such that

item_id is unique over the list
a and b are paired values from a list [(a0, b0), (a1, b1), ...]
from that list of (a, b) every value is drawn at least once

This is what I've come up with:

from hypothesis import strategies as st

def thing_lists(ab):
    # ab = [(a0, b0), (a1, b1), ...]
    vals = (
        # list that has at least all values from ab
        st.lists(st.sampled_from(ab))
        .flatmap(lambda sample: st.permutations(sample + ab))
        # add ids
        .flatmap(
            # get unique ids
            lambda abs: st.lists(
                st.text(), min_size=len(abs), max_size=len(abs), unique=True
            # zip with abs
            ).map(lambda ids: [(id, *ab) for id, ab in zip(ids, abs)])
        )
    )
    return vals.flatmap(
        lambda idabs: st.tuples(
            *[
                st.builds(
                    Thing,
                    st.just(id), 
                    st.just(a), 
                    st.just(b),
                    # other attributes by free choice of hypothesis
                )
                for (id, a, b) in idabs
            ]
        ).map(list)
    )

This works but it's a bit convoluted drawing a tuple and then mapping that over to a list; am I missing a different technique that's clearer about what's going on?

Can you provide some sample input and output? What does `list_of_things = [Thing(i,ab[0],ab[1]) for i,ab in enumerate(ab)]` not give you, for instance? — JeffUK, Dec 06 '21 at 05:36
If code is already working maybe[codereview.se] is more suitable? (read help center before asking) — user202729, Dec 06 '21 at 07:02
@JeffUK it doesn't give me (a) more than just the minimum number of values, (b) free(er) choice over other attributes, which are important in the context of property-based testing. Edited to add that context to the question, thanks. — Andrea Reina, Dec 06 '21 at 07:21
@user202729 the question's more about how to (better) use this particular tool, which seems more in-line with the questions I see here than there. I did think about asking the broader how and then answering with my could-be-improved answer. — Andrea Reina, Dec 06 '21 at 07:36

score 1 · Answer 1 · answered Dec 11 '21 at 03:41

That looks basically reasonably to me - it's pretty grungy, but mostly because of your requirements. I'd also try

vals = st.lists(
    st.tuples(st.text(), st.sampled_from(ab)),
    min_size=len(ab),
    unique_by=lambda x: x[0],
).filter(lambda ls: {x for _, x in ls}.issuperset(ab))

but obviously that filter is going to be a tough constraint if ab is long, so I'd only expect a perf win if it's short (but measure!).

If this does work, you'll get faster and sometimes better shrinking than the above; inlining the builds() part to make the generation of each list element fully local would improve this even further but might require a new constructor. None of this is worth sacrificing readability for, though!

I'm not following what you mean by inlining `builds()` to make generation of each list element fully local, could you expand on that a bit? — Andrea Reina, Dec 13 '21 at 06:34
If you can arrange things so that *everything* is inside a single `st.lists()` strategy, then Hypothesis can swap around elements (etc). This is neat, but probably not applicable to your problem :/ — Zac Hatfield-Dodds, Dec 13 '21 at 08:40

Simpler way to draw a list of a compound values with requirements of uniqueness and completeness?

1 Answers1