Efficient way of requiring that a certain portion of elements from a Hypothesis strategy must be unique

Question

What is the most efficient way of requiring rather than that all elements generated according to a hypothesis strategy are unique, at least a certain portion are unique?

Strategies such as the following appear to be inefficient when multiply composed partly because of the unique=True condition. In this case, it matters that at least a certain portion of the elements are unique rather than that all are.

def base_float(
    min_value=None,
    max_value=None,
    *,
    allow_nan=False,
    allow_infinity=False,
    allow_subnormal=False,
):
    """Strategy for returning a float."""
    return st.floats(
        min_value=min_value,
        max_value=max_value,
        allow_nan=allow_nan,
        allow_infinity=allow_infinity,
        allow_subnormal=allow_subnormal,
    )

@st.composite
def plaus_arr(draw, size=None, bounds=ARR_LEN):
    size = draw(st.integers(*bounds)) if size is None else size
    areas = draw(st_np.arrays(float, size, elements=base_float(), unique=True))
    return areas

Is the best approach some modification of the approach suggested here?

What do you mean "a certain portion are unique"? I presume you mean you need at least N unique elements out of M total? Then it's pretty important whether N/M is a small or a large fraction! Your code also has a syntax error, missing `else`-clause on the `size =` ternary statement. — Zac Hatfield-Dodds, Feb 04 '23 at 07:25
If one doesn't use the `unique` option, hypothesis generates examples where all the elements are identical. Those are implausible in this context. At a minimum, I want to exclude them. It would be helpful to be able to specify that at least a quarter or a half must be unique. Real-world data of the relevant type are mostly unique. You're of course correct about the syntax error @ZacHatfield-Dodds — a typo I will fix. Thanks. — curlew77, Feb 05 '23 at 08:38
you can just produce unique subsequence and then extend it with non-unique subsequence of size lesser that you got and then potentially permute resulting sequence — Azat Ibrakov, Feb 15 '23 at 17:53

Efficient way of requiring that a certain portion of elements from a Hypothesis strategy must be unique

0 Answers0