Why does hypothesis consider this code slow?

Question

Hypothesis complains vehemently that this was slow:

@composite
def f_and_g_and_padding(draw, in_channels = channel_ints, out_channels = channel_ints, fs = shapes_2d, fill=None, elements=well_behaved_floats):
    shape_f = draw(basic_shape)
    padding = draw(shapes_2d)
    fs = draw(fs)
    in_channels = draw(in_channels)
    out_channels = draw(out_channels)
    batch_size = draw(shape_ints)
    shape_f = (batch_size, in_channels, fs[0], fs[1])
    f = draw(stnp.arrays(dt_numpy, shape_f, elements=elements, fill=fill))
    h_in = f.shape[2] + padding[0] * 2
    w_in = f.shape[3] + padding[1] * 2
    shape_g = (out_channels, in_channels, h_in, w_in)
    g = draw(stnp.arrays(dt_numpy, shape_g, elements=elements, fill=fill))
    
    return (f, g, padding)

I have tried to find out why, but failed. See: How to use pytest, hypothesis and line_profiler / kernprof together?.

So, my question remains: Why?

Here are the other strategies used:

well_behaved_floats = stnp.from_dtype(dtype=dt_numpy, allow_infinity=False, allow_nan=False)
small_floats = stnp.from_dtype(dtype=dt_numpy, min_value=-10000, max_value=10000, allow_infinity=False, allow_nan=False)
floats_0_1 = stnp.from_dtype(dtype=dt_numpy, min_value=-1, max_value=1, allow_infinity=False, allow_nan=False)
small_ints = stnp.from_dtype(dtype=numpy.dtype("i4"), allow_infinity=False, allow_nan=False, min_value=-10, max_value=10)
small_positive_ints = stnp.from_dtype(dtype=numpy.dtype("i4"), allow_infinity=False, allow_nan=False, min_value=0, max_value=10)
one_or_greater = st.integers(min_value=1)
shape_ints = st.integers(min_value=1, max_value=4)
channel_ints = st.integers(min_value=1, max_value=10)
basic_shape = stnp.array_shapes(min_dims=4, max_dims=4, min_side=1, max_side=10)
ones = st.integers(min_value=1, max_value=1)

shapes_2d = stnp.array_shapes(min_dims=2, max_dims=2, min_side=1, max_side=4)

Used like this:

@given(f_and_g_and_padding(elements=ones))
def test_padding(f_g_padding: Tuple[numpy.ndarray, numpy.ndarray, Tuple[int, int]]):
    f, g, padding = f_g_padding
    run_test(Tensor(f), Tensor(g), padding=padding)

There's no filtering, just plain simple drawing and numpy arrays.

fwiw here's the hypothesis config:

hypothesis.settings.register_profile("default",
                                     derandomize=True,
                                     deadline=None,
                                     print_blob=True,
                                     report_multiple_bugs=False,
                                     suppress_health_check=[HealthCheck.too_slow])

score 1 · Accepted Answer · answered Oct 20 '21 at 04:55

1

I'd expect that your basic_shapes strategy is the culprit; with a minimum of four dimensions you're already into n^4 elements in the average side length and that's going to be slow. Consider reducing the max_side for this strategy; if that's unacceptable you might need to generate shapes with Hypothesis but elements with numpy.random.

I'd also recommend against passing allow_infinity=False, allow_nan=False to strategies for integers, or for bounded floats - in either case non-finite numbers are already ruled out, so while they don't do anything it's a hit to readability.

answered Oct 20 '21 at 04:55

Zac Hatfield-Dodds

2,455
6
19

Interesting idea! The shapes need to be of 4 dimensions so reducing max_size is not an option. I hadn't considered that numpy.random could be faster than draw(stnp.arrays(…)), I shall give this a try. I also removed the nan and infinity checks, those were copy paste leftovers, good catch. I also noticed a while after I had submitted the question that the machine I was running pytest on had 100% CPU usage because of some other user, so if pytest only checks for the elapsed time (rather than, e.g., elapsed process time or executed CPU instructions) that would have done it for sure. – phdoerfler Oct 20 '21 at 06:51
1

IIRC `pytest` does indeed consider walltime, so that could be a substantial part of it! But also note that you can leave `min_dims=4` but decrease to e.g. `max_side=5`; same dimensions but far far fewer elements. – Zac Hatfield-Dodds Oct 20 '21 at 10:48

Why does hypothesis consider this code slow?

1 Answers1

Linked