4

I'm writing tests for a statistical analysis with hypothesis. Hypothesis led me to a ZeroDivisionError in my code when it is passed very sparse data. So I adapted my code to handle the exception; in my case, that means log the reason and reraise the exception.

try:
    val = calc(data)
except ZeroDivisionError:
    logger.error(f"check data: {data}, too sparse")
    raise

I need to pass the exception up through the call stack because the top-level caller needs to know there was an exception so that it can pass an error code to the external caller (a REST API request).

Edit: I can't also assign a reasonable value to val; essentially I need a histogram, and this happens when I'm calculating a reasonable bin width from the data. Obviously this fails when the data is sparse. And without the histogram, the algorithm cannot proceed any further.

Now my issue is, in my test when I do something like this:

@given(dataframe)
def test_my_calc(df):
    # code that executes the above code path

hypothesis keeps generating failing examples that trigger ZeroDivisionError, and I don't know how to ignore this exception. Normally I would mark a test like this with pytest.mark.xfail(raises=ZeroDivisionError), but here I can't do that as the same test passes for well behaved inputs.

Something like this would be ideal:

  1. continue with the test as usual for most inputs, however
  2. when ZeroDivisionError is raised, skip it as an expected failure.

How could I achieve that? Do I need to put a try: ... except: ... in the test body as well? What would I need to do in the except block to mark it as an expected failure?

Edit: to address the comment by @hoefling, separating out the failing cases would be the idea solution. But unfortunately, hypothesis doesn't give me enough handles to control that. At most I can control the total count, and limits (min, max) of the generated data. However the failing cases have a very narrow spread. There is no way for me to control that. I guess that's the point of hypothesis, and maybe I shouldn't be using hypothesis at all for this.

Here's how I generate my data (slightly simplified):

cities = [f"city{i}" for i in range(4)]
cats = [f"cat{i}" for i in range(4)]


@st.composite
def dataframe(draw):
    data_st = st.floats(min_value=0.01, max_value=50)
    df = []
    for city, cat in product(cities, cats):
        cols = [
            column("city", elements=st.just(city)),
            column("category", elements=st.just(cat)),
            column("metric", elements=data_st, fill=st.nothing()),
        ]
        _df = draw(data_frames(cols, index=range_indexes(min_size=2)))
        # my attempt to control the spread
        assume(np.var(_df["metric"]) >= 0.01)
        df += [_df]
    df = pd.concat(df, axis=0).set_index(["city", "category"])
    return df
suvayu
  • 4,271
  • 2
  • 29
  • 35
  • Looks to me like you're trying to combine two test cases in one. Why not have one test for input data that doesn't cause an exception (branch 1) and another one for input data that does (branch 2)? – hoefling Jul 26 '19 at 09:54
  • @hoefling that would be the ideal solution, but I have been unable to control the data generation process to achieve this. Please see my edit. – suvayu Jul 26 '19 at 10:16

1 Answers1

6
from hypothesis import assume, given, strategies as st

@given(...)
def test_stuff(inputs):
    try:
        ...
    except ZeroDivisionError:
        assume(False)

The assume call will tell Hypothesis that this example is "bad" and it should try another, without failing the test. It's equivalent to calling .filter(will_not_cause_zero_division) on your strategy, if you had such a function. See the docs for details.

Zac Hatfield-Dodds
  • 2,455
  • 6
  • 19
  • 1
    I didn't think of using `assume` in the test itself (only used inside strategies), very interesting approach. I'll try this out. Thanks! – suvayu Jul 28 '19 at 06:16