Hypothesis testing library unable to find a failing example for this simple arithmetic problem

Question

I am trying to learn the hypothesis testing library for python and I came up with the following example (taken from a math channel in youtube) which is a very simple arithmetic problem: find x, y, w, z such that

x*y = 21 & x+w = 8 & y*z = 9 & w - z = 5

The solution is x = 2.1, y = 10, w = 5.9, z = 0.9. Using hypothesis as a declarative programming library, I was expecting to find the solution rather quickly.

The code I used with hypothesis is:

from hypothesis import given
import hypothesis.strategies as st
from typing import Tuple

def f(a: float, b: float, c: float ,d: float) -> Tuple[float]:
    return (a*b, a+c, b*d, c-d)

@given(
    st.tuples(
        st.floats(min_value=0),
        st.floats(min_value=0),
        st.floats(min_value=0),
        st.floats(min_value=0)
    )
)
def test_f(f32_tuple):
    assert f(*f32_tuple) != (21, 8, 9, 5)

After launching it a few times with pytest, hypothesis is unable to find the solution. At first I thought it was a floating point comparison problem, or maybe the search space is just too enormous, so I decided to cut it back to integers (modifying the last number in the tuple), for example:

from hypothesis import given
import hypothesis.strategies as st
from typing import Tuple

def f(a: float, b: float, c: float ,d: float) -> Tuple[float]:
    return (a*b, a+c, b*d, c-d)

@given(
    st.tuples(
        st.integers(min_value=0, max_value=10),
        st.integers(min_value=0, max_value=10),
        st.integers(min_value=0, max_value=10),
        st.integers(min_value=0, max_value=10),
    )
)
def test_f(f32_tuple):
    assert f(*f32_tuple) != (21, 8, 9, -2)

Here, the solution would be the tuple (7, 3, 1, 3), the search space has "only" 10^4 elements so I expected it to find a solution after a few runs.

This behavior concerns me, since the usefulness of the library lies in its ability to detect cases one would not come up with normally.

Am I using the generators wrong? Or is hypothesis unable to deal with such cases? I need to know if I am going to use it in a day-to-day basis.

score 0 · Accepted Answer · answered Aug 25 '21 at 20:31

Hypothesis uses a variety of heuristics to find "interesting" inputs, but is essentially still throwing random data at your function. By default, hypothesis only makes 100 attempts. You can increase this, however, with a decorator like @settings(max_examples=20000). Adding this to your bounded-integer version is sufficient to get Hypothesis to find the solution:

-------------------------------------------- Hypothesis --------------------------------------------
Falsifying example: test_f(
    f32_tuple=(7, 3, 1, 3),
)
===================================== short test summary info ======================================
FAILED so_arith.py::test_f - assert (21, 8, 9, -2) != (21, 8, 9, -2)

In many practical situations, this randomized approach works quite well! But not in the example you have here.

This kind of problem is best analyzed with a constraint solver. CrossHair is a solver-based system for checking Python properties, and can handle the unbounded version. (disclaimer: I am the primary maintainer!) Here is the CrossHair equivalent of your example:

from typing import Tuple

def f(a: float, b: float, c: float ,d: float) -> Tuple[float]:
    """ post: _ != (21, 8, 9, -2) """
    return (a*b, a+c, b*d, c-d)

Running crosshair check on this file produces the output you expect:

/tmp/main.py:4: error: false when calling f(a = 7.0, b = 3.0, c = 1.0, d = 3.0) (which returns (21.0, 8.0, 9.0, -2.0))

Hypothesis testing library unable to find a failing example for this simple arithmetic problem

1 Answers1