Create hypothesis strategy that returns unique values

Question

I'm trying to create a hypothesis strategy which produces integers with no repeats. Here's my code:

import hypothesis
import hypothesis.strategies as strategies

def unique(strat):
    previous = set()

    @strategies.composite
    def new_strategy(draw):
        while True:
            value = draw(strat)
            if value not in previous:
                previous.add(value)
                return value

    return new_strategy

strategy = unique(strategies.integers(min_value=0, max_value=1000))

@hypothesis.given(num=strategy)
def test_unique(num):
    pass

However, when I run pytest, I get

    @check_function
    def check_strategy(arg, name="")
        if not isinstance(arg, SearchStrategy):
            hint = ""
            if isinstance(arg, (list, tuple)):
                hint = ", such as st.sampled_from({}),".format(name or "...")
            if name:
                name += "="
            raise InvalidArgument(
                "Expected a SearchStrategy%s but got %s%r (type=%s)"
                % (hint, name, arg, type(arg).__name__)
            )
E           hypothesis.errors.InvalidArgument: Expected a SearchStrategy but got mapping['num']=<function accept.<locals>.new_strategy at 0x7f30622418b0> (type=function)

Also, how would your strategy handle strats with a finite set of values, such as `hypothesis.strategies.booleans()` or `hypothesis.strategies.integers(0, 5)`? — NicholasM, Sep 15 '22 at 20:34
@NicholasM, I admit I've thought of this but don't have an answer yet. For my use case, I'll just make sure to not make the sample size too large. — Daniel Walker, Sep 15 '22 at 20:36

Zac Hatfield-Dodds · Answer 1 · 2023-02-04T07:21:52.383

0

@st.composite
def unique(draw, strategy):
    seen = draw(st.shared(st.builds(set), key="key-for-unique-elems")))
    return draw(
        strategy
        .filter(lambda x: x not in seen)
        .map(lambda x: seen.add(x) or x)
    )

There's a couple of cute tricks here:

Use st.shared() to create a new seen cache for each distinct example. This fixes your "what if you run out of values" problem, but also fixes your critical "the test can't replay failures" problem which would make the whole thing horribly flaky.
- For advanced tricks, try using the key= argument to shared, or having it construct a dictionary.
.filter(...) to exclude already-seen items. This is better than a loop because it means Hypothesis can more effectively avoid and report on futile attempts to generate a not-yet-seen example.
The .map(...) call to add it to the set is a blatant abuse of the facts that set.add() returns None after mutating the object, and that or evaluates to the second item if the first is falsey.

edited Feb 04 '23 at 07:21

answered Sep 22 '22 at 07:16

Zac Hatfield-Dodds

2,455
6
19

Oops, yep - the `st.sets()` strategy needs an `elements=` argument, so I switched to `st.builds(set)` instead of `st.sets(st.nothing())` - the latter would work but it's a bit obscure for my taste, and adding `max_size=0` would be overly verbose. – Zac Hatfield-Dodds Sep 27 '22 at 05:16
Still getting repeat values. – Daniel Walker Sep 27 '22 at 15:54
Aaand, if we pass the `key=` argument to `shared()` it actually works. – Zac Hatfield-Dodds Sep 28 '22 at 00:56
Still getting repeat values. – Daniel Walker Sep 28 '22 at 13:12
1

Repeats in different calls to your test function are meant to be possible (and required during shrinking, replay, and flake-detection); if within an example post your output? – Zac Hatfield-Dodds Sep 30 '22 at 07:54
0, 0, 960, 0, 744, 0, 817, ... – Daniel Walker Oct 07 '22 at 14:04

Create hypothesis strategy that returns unique values

1 Answers1

Linked