Property-based testing and float equality

Question

So I'm trying to compare two implementations of a function with Hypothesis to determine if they work the same way with a huge variety of different inputs that I might not think of myself.

I tried using numpy.testing.assert_allclose to compare the outputs, but Hypothesis just repeatedly outsmarts it. The more I widen the acceptable tolerance, the larger values Hypothesis throws at it until it fails, even though the outputs are similar enough to be considered the same.

E   Not equal to tolerance rtol=0.1, atol=0.001
...
Falsifying example: test_resample_1d_consistency(a=array([7.696582e+12, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00], dtype=float32), num=11)

E   Not equal to tolerance rtol=0.1, atol=0.01
...
Falsifying example: test_resample_1d_consistency(a=array([7.366831e+13, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00], dtype=float32), num=11)

E   Not equal to tolerance rtol=1000, atol=1000
...
Falsifying example: test_resample_1d_consistency(a=array([8.360933e+18, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00], dtype=float32), num=186)

etc.

So I guess I need a different "good enough" test of similarity, or I need to limit the input value range in some way? But I'm not sure how to do these in a way that won't miss genuinely wrong answers. Any advice?

As you know, it is better to put mcve instead of link to github and statement “I tried using numpy and Hypothesis”. — sanyassh, Apr 20 '19 at 15:00

score 1 · Answer 1 · answered Apr 24 '19 at 02:43

It looks to me like rfft does give very different results in extreme cases - so you'll need to decide whether this is a bug or not. Maybe Hypothesis has actually shown that it's not a suitable optimization!

Put another way, the problem of determining an appropriate error tolerance for a given magnitude of input is actually the hardest part of testing! (in the literature, this is "the oracle problem" of how to distinguish good from bad behavior)

Once you have a bound though - say rtol=0.1, atol=0.001 for all arrays with elements in [-1000., 1000.] you can pass the elements argument to the arrays strategy to constrain those values for each test, or try a range of magnitude/tolerance combinations.

`the problem of determining an appropriate error tolerance for a given magnitude of input is actually the hardest part of testing!` Yes, that's why I couldn't figure it out on my own and asked an SO question about how to decide it! :D — endolith, Apr 24 '19 at 03:28

Property-based testing and float equality

1 Answers1