How to calculate floating-point precision after round-off errors in +, -, *, and /?

Question

For the purpose of verification, I would like to be able calculate a reasonably tight upper bound on the accumulated error due to rounding to representable values during some specific arithmetic computation.

Assume that we have a function foo() that claims to perform some specific arithmetic computation. Assume also that there is an implied guarantee about the maximum error (due to rounding) stemming from the involved type being either float or double and the implied (or stated) way in which foo() caries out the computation.

I would like to be able to verify the result from foo() for a specific set of input values by also carrying out the computation in a fashion that keeps tracks the accumulated worst-case error, and then check if the two results are as close as the final worst-case error demands.

I imagine that it would be possible to do this by introducing a new arithmetic class track_prec<T> that adds precision tracking to one of the fundamental floating-point types, and then let it be up to the implementation of the arithmetic operators of that class to calculate the worst-case errors of each subexpression. My problem is that I don't know how to compute these worst-case errors in these general cases:

// T = float or double
template<class T> class track_prec {
public:
    T value;
    T ulp; // http://en.wikipedia.org/wiki/Unit_in_the_last_place

    track_prec& operator+=(const track_prec& v)
    {
        value += v.value;
        ulp = ???; // How to do this? And what about -=, *=, and /=?
    }

    friend bool operator==(T, const track_prec&)
    {
        // Exactly how should this comparison be done?
    }
};

Assume, for example, that foo() is a simple summation over a sequence of numbers. Then we could use track_prec<T> as follows:

std::vector<T> values { 0.4, -1.78, 1.3E4, -9.29E3, ... };
CHECK_EQUAL(std::accumulate(values.begin(), values.end(), track_prec<T>()),
            foo(values.begin(), values.end()));

Of course, any kind of help is welcome, but pointers to free and working code would be very nice.

I found these links on the subject, but they do not seem to provide direct answers to my questions.

Interval arithmetic is attractive, but it has not been hugely popular. There is little hardware support for it; most machines either do not support the necessary rounding modes or impose a high execution-time cost for switching rounding modes. Most often, when error bounds are needed, they are calculated by humans for specific situations or are estimated by various means. — Eric Postpischil, Oct 06 '13 at 20:04
Ok, in my case, though, performance is a non-issue, since only the verification code will do the precision tracking. But do you know of any accessible C/C++ implementations that do automated calculation of error bounds? — Kristian Spangsege, Oct 07 '13 at 00:06

Pascal Cuoq · Accepted Answer · 2013-10-06T15:42:03.330

3

The simplest way to track precision of floating-point computations is called interval arithmetic. It does not require IEEE 754 arithmetic, only that the computations can be switched to round upwards or downwards, so that the bounds of the computed intervals at each step contain all the reals that could have resulted from the same computations had they been done with real arithmetic.

You should be able to find many existing implementations of interval arithmetic.

The accuracy of the computation at any given step is the width of the interval computed at that step.

Beware of constants: if you wish to approximate πx, you need to multiply a floating-point interval that contains π by the interval for x. If you multiply the interval for x by the double denoted as 3.1415926535897932, you will get the error for the multiplication of x by that double (which equals neither π nor 3.1415926535897932). In the code in your question, the constant 0.4 means “the double nearest to 0.4”. If you need the rational number 4/10, use the interval whose bounds are the two doubles denoted respectively by 3.9999999999999997e-01 and 4.0000000000000002e-01.

edited Oct 06 '13 at 15:42

answered Oct 06 '13 at 15:15

Pascal Cuoq

79,187
7
161
281

So the idea of using ULP is a dead end, or? – Kristian Spangsege Oct 06 '13 at 15:22
@KristianSpangsege: basically, yes. Suppose you have two positive numbers which differ only in the ULP, and which are known to be correct to within one ULP. If you add them, the result is correct within one ULP (the new ULP is worth twice the old one). But what if you subtract them? – rici Oct 06 '13 at 15:30
@KristianSpangsege Representing the error as a width around the results of a single floating-point computation is possible but the mathematics of it are not pleasant if you wish to do it correctly. Say that you have real numbers n1 and n2 represented as (d1 + e1) and (d2 + e2). The error of the multiplication of d1 and d2 is half the ULP of d1 * d2. But you didn't want to multiply d1 and d2, you wanted to multiply n1 and n2. So you have to write (d1 + e1)*(d2 + e2) = d1*d2 + … and include in the new error the compounded error resulting from previous approximations. – Pascal Cuoq Oct 06 '13 at 15:32
1

@KristianSpangsegeSince you asked about C++ implementations, the following article refers to several of them. At least one of them must allow the “rounded interval arithmetic” that I recommend. http://www.ac.usc.es/arith19/sites/default/files/3670a231-spec-session-interval-paper1.pdf – Pascal Cuoq Oct 07 '13 at 22:26

0kcats · Answer 2 · 2017-10-11T19:35:27.907

2

Check out how the error tracking is implemented here. https://github.com/Esri/geometry-api-java/blob/master/src/main/java/com/esri/core/geometry/ECoordinate.java

The basic idea is that floating point arithmetic results would be close to the correct value +- 0.5 * DBL_EPSILON * value. So you could track and accumulate it. The code in the above link would calculate the absolute error of of a + b as

err(a+b) = err(a) + err(b) + DBL_EPSILON * abs(a + b).

Assumptions: IEEE 754 double, floating point operations use guard bits.

edited Oct 11 '17 at 19:35

answered Oct 30 '15 at 02:32

0kcats

672
5
16

1

This answer has absolute and relative error mixed up. Rounding error is different from the x +/- d expressions one might use to characterize measurements with an uncertainty in absolute terms. Machine epsilon instead characterizes relative error because round-off affects the mantissa. The appropriate expression for describing round-off error is rounded(x `op` y) = (x `op` y)*(1 + d) with |d| <= eps. – Praxeolitic Oct 08 '17 at 18:03
@Praxeolitic In my answer ulp is the estimate of the absolute error. – 0kcats Oct 09 '17 at 22:09
ulp(x `op` y) can be used as an upper bound for round-off error for an operation but the equations you have here don't make much sense, especially the use of epsilon. – Praxeolitic Oct 10 '17 at 09:20
@Praxeolitic Could you be specific as why? For addition the absolute errors are summed up and the absolute error of operation is eps * abs(result). So the total takes into account that. – 0kcats Oct 10 '17 at 16:48
To keep a running upper bound on accumulated absolute round-off error, for "a + b" we could add previously accumulated error of a, previously accumulated error of b, and newly introduced round-off error. For new round-off error, using ulp of the result makes sense. For the previously accumulated errors in a and b, we need to have been tracking them. There's no way to calculate the accumulated errors in those numbers just from the values (e.g. ulp(a)) because round-off error is generated by operations, it's not inherent to a floating point value. – Praxeolitic Oct 10 '17 at 17:07
@Praxeolitic Check out the OP question, he has a class there exactly for tracking of the error, so a and b are values that track the previously accumulated error. – 0kcats Oct 10 '17 at 17:13
That is ulp = this.ulp + v.ulp + DBL_EPSILON * abs(value + v.value) – 0kcats Oct 10 '17 at 17:15
That makes more sense. "ulp" was an unfortunate name choice by OP. Something like "err" would have been better. The use of DBL_EPSILON here still doesn't make sense though. I think you want get_ulp(abs(a+b)). – Praxeolitic Oct 10 '17 at 17:20
It makes sense. The error of the result of an elementary floating point operation is not exceeding and is close to abs(result) * eps. – 0kcats Oct 10 '17 at 17:23
I've edited the answer: renamed ulp to err, added assumptions, fixed one typo, and removed a sentence. – 0kcats Oct 11 '17 at 19:36

How to calculate floating-point precision after round-off errors in +, -, *, and /?

2 Answers2