Probability that a formula fails in IEEE 754

Question

On my computer, I can check that

(0.1 + 0.2) + 0.3 == 0.1 + (0.2 + 0.3)

evaluates to False.

More generally, I can estimate that the formula (a + b) + c == a + (b + c) fails roughly 17% of the time when a,b,c are chosen uniformly and independently on [0,1], using the following simulation:

import numpy as np
import numexpr

np.random.seed(0)
formula = '(a + b) + c == a + (b + c)'


def failure_expectation(formula=formula, N=10**6):
    a, b, c = np.random.rand(3, N)
    return 1.0 - numexpr.evaluate(formula).mean()
# e.g. 0.171744

I wonder if it is possible to arrive at this probability by hand, e.g. using the definitions in the floating point standard and some assumption on the uniform distribution.

Given the answer below, I assume that the following part of the original question is out of reach, at least for now.

Is there is a tool that computes the failure probability for a given formula without running a simulation.

Formulas can be assumed to be simple, e.g. involving the use of parentheses, addition, subtraction, and possibly multiplication and division.

(What follows may be an artifact of numpy random number generation, but still seems fun to explore.)

Bonus question based on an observation by NPE. We can use the following code to generate failure probabilities for uniform distributions on a sequence of ranges [[-n,n] for n in range(100)]:

import pandas as pd

def failures_in_symmetric_interval(n):
    a, b, c = (np.random.rand(3, 10**4) - 0.5) * n
    return 1.0 - numexpr.evaluate(formula).mean()

s = pd.Series({
    n: failures_in_symmetric_interval(n)
    for n in range(100)
})

The plot looks something like this:

In particular, failure probability dips down to 0 when n is a power of 2 and seems to have a fractal pattern. It also looks like every "dip" has a failure probability equal to that of some previous "peak". Any elucidation of why this happens would be great!

You probably already know this, but the answer also very much depends on the input range. For example, changing `[0; 1)` to `[0; 1000)` increases the probability to `0.254629`. Drawing samples from a range that's symmetrical about zero (e.g. `[-1; 1)`) is even more interesting! — NPE, Aug 16 '19 at 08:24
I tried to fix ideas by setting the range to `[0,1]`, but I agree this is somewhat arbitrary. I am not sure I understand how setting range symmetric around zero makes this more interesting, but I would be glad to find out. — hilberts_drinking_problem, Aug 16 '19 at 08:29
Try playing with `[-1; 1)`. I haven't yet figured out why it behaves the way it does. — NPE, Aug 16 '19 at 08:36
Very nice! It looks like `a, b, c = (np.random.rand(3, N) - 0.5) * n` is special when `n` is a power of `2`. — hilberts_drinking_problem, Aug 16 '19 at 08:41
Link: [Wikipedia article about internal float representation](https://en.wikipedia.org/wiki/Double-precision_floating-point_format). I suspect `(np.random.rand(3, 10**4) - 0.5) * n` is responsible for the "never loses precision in a power-of-two interval", because the `*n` operation only changes the (integer) exponent, not the significand/mantissa in that special case. I doubt `np.random.rand(...) * n` is equivalent to "take 64 (uniform, uncorrelated, equal-chance 0/1) random bits, make it into a float, if it is too high/low roll the dice again". — Leporello, Aug 16 '19 at 09:43
It's definitely possible to compute these things by hand. A simpler starting point might be uniform numbers in `[1.0, 2.0]` rather than `[0.0, 1.0]` though, since then all the original numbers are in the same binade. — Mark Dickinson, Aug 16 '19 at 09:57
My advice is to characterize the variables a and b for which an exact result is possible, and count them up. Then of course the probability in question is just (number of such values) divided by (total number of values). Whether an exact result is possible can probably be derived from reasoning about the floating point representation; you probably also need to be more specific about the input representation. I second the suggestion to look at the range [1, 2], maybe expand to other ranges if you can get that figured out. — Robert Dodier, Aug 16 '19 at 16:53
I'm voting to close this question as off-topic because while this was an interesting question, this *really* is not related to programming at the end of the day. — Paritosh Singh, Aug 19 '19 at 18:48
@ParitoshSingh I'm note sure I agree. The answer depends on the definitions in the IEEE standard and the implementation of `np.random.rand`. Let's see what the consensus is. — hilberts_drinking_problem, Aug 19 '19 at 22:56
@hilberts_drinking_problem - Not fractal, ruler. A ruler is divided into 8ths or 16ths or 32nds / etc, very much like one of the chunks of your graph. OK, I see why you say fractal. But I see this as too regular. — Rick James, Nov 22 '19 at 02:00

Mark Dickinson · Accepted Answer · 2019-08-16T21:18:44.927

It's definitely possible to evaluate these things by hand, but the only methods I know are tedious and involve a lot of case-by-case enumeration.

For example, for your specific example of determining the probability that (a + b) + c == a + (b + c), that probability is 53/64, to within a few multiples of the machine epsilon. So the probability of a mismatch is 11/64, or around 17.19%, which agrees with what you were observing from your simulation.

To start with, note that there's a major simplifying factor in this particular case, and that's that Python and NumPy's "uniform-on-[0, 1]" random numbers are always of the form n/2**53 for some integer n in range(2**53), and within the constraints of the underlying Mersenne Twister PRNG, each such number is equally likely to occur. Since there are around 2**62 IEEE 754 binary64 representable values in the range [0.0, 1.0], that means that the vast majority of those IEEE 754 values aren't generated by random.random() (or np.random.rand()). This fact greatly simplifies the analysis, but also means that it's a bit of a cheat.

Here's an incomplete sketch, just to give an idea of what's involved. To compute the value of 53/64, I had to divide into five separate cases:

The case where both a + b < 1 and b + c < 1. In this case, both a + b and b + c are computed without error, and (a + b) + c and a + (b + c) therefore both give the closest float to the exact result, rounding ties to even as usual. So in this case, the probability of agreement is 1.
The case where a + b < 1 and b + c >= 1. Here (a + b) + c will be the correctly rounded value of the true sum, but a + (b + c) may not be. We can divide further into subcases, depending on the parity of the least significant bits of a, b and c. Let's abuse terminology and call a "odd" if it's of the form n/2**53 with n odd, and "even" if it's of the form n/2**53 with n even, and similarly for b and c. If b and c have the same parity (which will happen half the time), then (b + c) is computed exactly and again a + (b + c) must match (a + b) + c. For the other cases, the probability of agreement is 1/2 in each case; the details are all very similar, but for example in the case where a is odd, b is odd and c is even, (a + b) + c is computed exactly, while in computing a + (b + c) we incur two rounding errors, each of magnitude exactly 2**-53. If those two errors are in opposite directions, they cancel and we get agreement. If not, we don't. Overall, there's a 3/4 probability of agreement in this case.
The case where a + b >= 1 and b + c < 1. This is identical to the previous case after swapping the roles of a and c; the probability of agreement is again 3/4.
a + b >= 1 and b + c >= 1, but a + b + c < 2. Again, one can split on the parities of a, b and c and look at each of the resulting 8 cases in turn. For the cases even-even-even and odd-odd-odd we always get agreement. For the case odd-even-odd, the probability of agreement turns out to be 3/4 (by yet further subanalysis). For all the other cases, it's 1/2. Putting those together gets an aggregate probability of 21/32 for this case.
Case a + b + c >= 2. In this case, since we're rounding the final result to a multiple of four times 2**-53, it's necessary to look not just at the parities of a, b, and c, but to look at the last two significant bits. I'll spare you the gory details, but the probability of agreement turns out to be 13/16.

Finally, we can put all these cases together. To do that, we also need to know the probability that our triple (a, b, c) lands in each case. The probability that a + b < 1 and b + c < 1 is the volume of the square-based pyramid described by 0 <= a, b, c <= 1, a + b < 1, b + c < 1, which is 1/3. The probabilities of the other four cases can be seen (either by a bit of solid geometry, or by setting up suitable integrals) to be 1/6 each.

So our grand total is 1/3 * 1 + 1/6 * 3/4 + 1/6 * 3/4 + 1/6 * 21/32 + 1/6 * 13/16, which comes out to be 53/64, as claimed.

A final note: 53/64 almost certainly isn't quite the right answer - to get a perfectly accurate answer we'd need to be careful about all the corner cases where a + b, b + c, or a + b + c hit a binade boundary (1.0 or 2.0). It would certainly be possible to do refine the above approach to compute exactly how many of the 2**109 possible triples (a, b, c) satisfy (a + b) + c == a + (b + c), but not before it's time for me to go to bed. But the corner cases should constitute on the order of 1/2**53 of the total number of cases, so our estimate of 53/64 should be accurate to at least 15 decimal places.

Of course, there are lots of details missing above, but I hope it gives some idea of how it might be possible to do this.

This is precisely what I was looking for. Thanks for taking your time to go through the messy case analysis, it gives a good idea of what it takes to answer these kinds of questions. — hilberts_drinking_problem, Aug 19 '19 at 04:58
By the way, it seems interesting to find out if using the full range of representable floats leads to a different failure probability than the `numpy` example. But, as they say, this is a great job for someone else. — hilberts_drinking_problem, Aug 19 '19 at 07:49

Probability that a formula fails in IEEE 754

1 Answers1