0

I found a completely different answer to this question, the whole original question makes no sense anymore. However, the answer way be useful, so I modify it a bit...

I want to sum up three double numbers, say a, b, and c, in the most numerically stable way possible. I think using a Kahan Sum would be the way to go.

However, a strange thought occured to me: Would it make sense to:

  1. First sum up a, b, and c and remember the (absolute value of the) compensation.
  2. Then sum up a, c, b
  3. If the (absolute value of the) compensation of the second sum is smaller, use this sum instead.
  4. Proceed similar with b, a, c and other permutations of the numbers.
  5. Return the sum with the smallest associated absolute compensation.

Would I get a more "stable" Addition of three numbers this way? Or does the order of numbers in the sum have no (use-able) impact on the compensation left at the end of the Summation? With (use-able) I mean to ask whether the compensation value itself is stable enough to contain Information that I can use?

(I am using the Java programming language, although I think this does not matter here.)

Many thanks, Thomas.

Thomas Weise
  • 389
  • 6
  • 13
  • _"I am using the Java programming language, although I think this does not matter here"_ I have a feeling Prof Kahan may [have other opinions](http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf) :-) – Simon F Nov 23 '15 at 08:38
  • Hehe, I know. But if I limit my question to strictly using `double` and only asking about the basic summation algorithm based on them, I think it should not matter. Not having flags is a pitty, though, and I find it odd that I had 80 bit `Extended` floating point numbers in Turbo Pascal and assembler - 20 years ago - but not anymore today... – Thomas Weise Nov 23 '15 at 10:17
  • 1
    Addition of exactly three operands is a special case of the general summation problem. The following paper shows, in algorithm 3, how to compute the correctly-rounded sum of three IEEE-754 floating-point operands: Sylvie Boldo and Guillaume Melquiond, "Emulation of FMA and Correctly Rounded Sums: Proved Algorithms Using Rounding to Odd", IEEE Transactions on Computers, Vol. 57, No. 4, April 2008, pp. 462-469. I programmed the algorithm, using two different emulations of round-to-odd, at the time this paper first appeared and it works well and reasonably efficient. I don't have the code anymore – njuffa Nov 23 '15 at 21:34

1 Answers1

0

I think I found a much more reliable way to solve the "Add 3" (or "Add 4" or "Add N" numbers problem.

First of all, I implemented my idea from the original post. It resulted into quite some big code which seemed, initially, to work. However, it failed in the following case: add Double.MAX_VALUE, 1, and -Double.MAX_VALUE. The result was 0.

@njuffa's comments inspired me dig somewhat deeper and at http://code.activestate.com/recipes/393090-binary-floating-point-summation-accurate-to-full-p/, I found that in Python, this problem has been solved quite nicely. To see the full code, I downloaded the Python source (Python 3.5.1rc1 - 2015-11-23) from https://www.python.org/getit/source/, where we can find the following method (under PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2):

static PyObject*
math_fsum(PyObject *self, PyObject *seq)
{
    PyObject *item, *iter, *sum = NULL;
    Py_ssize_t i, j, n = 0, m = NUM_PARTIALS;
    double x, y, t, ps[NUM_PARTIALS], *p = ps;
    double xsave, special_sum = 0.0, inf_sum = 0.0;
    volatile double hi, yr, lo;

    iter = PyObject_GetIter(seq);
    if (iter == NULL)
        return NULL;

    PyFPE_START_PROTECT("fsum", Py_DECREF(iter); return NULL)

    for(;;) {           /* for x in iterable */
        assert(0 <= n && n <= m);
        assert((m == NUM_PARTIALS && p == ps) ||
               (m >  NUM_PARTIALS && p != NULL));

        item = PyIter_Next(iter);
        if (item == NULL) {
            if (PyErr_Occurred())
                goto _fsum_error;
            break;
        }
        x = PyFloat_AsDouble(item);
        Py_DECREF(item);
        if (PyErr_Occurred())
            goto _fsum_error;

        xsave = x;
        for (i = j = 0; j < n; j++) {       /* for y in partials */
            y = p[j];
            if (fabs(x) < fabs(y)) {
                t = x; x = y; y = t;
            }
            hi = x + y;
            yr = hi - x;
            lo = y - yr;
            if (lo != 0.0)
                p[i++] = lo;
            x = hi;
        }

        n = i;                              /* ps[i:] = [x] */
        if (x != 0.0) {
            if (! Py_IS_FINITE(x)) {
                /* a nonfinite x could arise either as
                   a result of intermediate overflow, or
                   as a result of a nan or inf in the
                   summands */
                if (Py_IS_FINITE(xsave)) {
                    PyErr_SetString(PyExc_OverflowError,
                          "intermediate overflow in fsum");
                    goto _fsum_error;
                }
                if (Py_IS_INFINITY(xsave))
                    inf_sum += xsave;
                special_sum += xsave;
                /* reset partials */
                n = 0;
            }
            else if (n >= m && _fsum_realloc(&p, n, ps, &m))
                goto _fsum_error;
            else
                p[n++] = x;
        }
    }

    if (special_sum != 0.0) {
        if (Py_IS_NAN(inf_sum))
            PyErr_SetString(PyExc_ValueError,
                            "-inf + inf in fsum");
        else
            sum = PyFloat_FromDouble(special_sum);
        goto _fsum_error;
    }

    hi = 0.0;
    if (n > 0) {
        hi = p[--n];
        /* sum_exact(ps, hi) from the top, stop when the sum becomes
           inexact. */
        while (n > 0) {
            x = hi;
            y = p[--n];
            assert(fabs(y) < fabs(x));
            hi = x + y;
            yr = hi - x;
            lo = y - yr;
            if (lo != 0.0)
                break;
        }
        /* Make half-even rounding work across multiple partials.
           Needed so that sum([1e-16, 1, 1e16]) will round-up the last
           digit to two instead of down to zero (the 1e-16 makes the 1
           slightly closer to two).  With a potential 1 ULP rounding
           error fixed-up, math.fsum() can guarantee commutativity. */
        if (n > 0 && ((lo < 0.0 && p[n-1] < 0.0) ||
                      (lo > 0.0 && p[n-1] > 0.0))) {
            y = lo * 2.0;
            x = hi + y;
            yr = x - hi;
            if (y == yr)
                hi = x;
        }
    }
    sum = PyFloat_FromDouble(hi);

_fsum_error:
    PyFPE_END_PROTECT(hi)
    Py_DECREF(iter);
    if (p != ps)
        PyMem_Free(p);
    return sum;
}

This summation method is different from Kahan's method, it uses a variable number of compensation variables. When adding the ith number, at most i additional compensation variables (stored in the array p) get used. This means if I want to add 3 numbers, I may need 3 additional variables. For 4 numbers, I may need 4 additional variables. Since the number of used variables may increase from n to n+1 only after the nth summand is loaded, I can translate the above code to Java as follows:

/**
 * Compute the exact sum of the values in the given array
 * {@code summands} while destroying the contents of said array.
 *
 * @param summands
 *          the summand array &ndash; will be summed up and destroyed
 * @return the accurate sum of the elements of {@code summands}
 */
private static final double __destructiveSum(final double[] summands) {
  int i, j, n;
  double x, y, t, xsave, hi, yr, lo;
  boolean ninf, pinf;

  n = 0;
  lo = 0d;
  ninf = pinf = false;

  for (double summand : summands) {

    xsave = summand;
    for (i = j = 0; j < n; j++) {
      y = summands[j];
      if (Math.abs(summand) < Math.abs(y)) {
        t = summand;
        summand = y;
        y = t;
      }
      hi = summand + y;
      yr = hi - summand;
      lo = y - yr;
      if (lo != 0.0) {
        summands[i++] = lo;
      }
      summand = hi;
    }

    n = i; /* ps[i:] = [summand] */
    if (summand != 0d) {
      if ((summand > Double.NEGATIVE_INFINITY)
          && (summand < Double.POSITIVE_INFINITY)) {
        summands[n++] = summand;// all finite, good, continue
      } else {
        if (xsave <= Double.NEGATIVE_INFINITY) {
          if (pinf) {
            return Double.NaN;
          }
          ninf = true;
        } else {
          if (xsave >= Double.POSITIVE_INFINITY) {
            if (ninf) {
              return Double.NaN;
            }
            pinf = true;
          } else {
            return Double.NaN;
          }
        }

        n = 0;
      }
    }
  }

  if (pinf) {
    return Double.POSITIVE_INFINITY;
  }
  if (ninf) {
    return Double.NEGATIVE_INFINITY;
  }

  hi = 0d;
  if (n > 0) {
    hi = summands[--n];
    /*
     * sum_exact(ps, hi) from the top, stop when the sum becomes inexact.
     */
    while (n > 0) {
      x = hi;
      y = summands[--n];
      hi = x + y;
      yr = hi - x;
      lo = y - yr;
      if (lo != 0d) {
        break;
      }
    }
    /*
     * Make half-even rounding work across multiple partials. Needed so
     * that sum([1e-16, 1, 1e16]) will round-up the last digit to two
     * instead of down to zero (the 1e-16 makes the 1 slightly closer to
     * two). With a potential 1 ULP rounding error fixed-up, math.fsum()
     * can guarantee commutativity.
     */
    if ((n > 0) && (((lo < 0d) && (summands[n - 1] < 0d)) || //
        ((lo > 0d) && (summands[n - 1] > 0d)))) {
      y = lo * 2d;
      x = hi + y;
      yr = x - hi;
      if (y == yr) {
        hi = x;
      }
    }
  }
  return hi;
}

This function will take the array summands and add up the elements while simultaneously using it to store the compensation variables. Since we load the summand at index i before the array element at said index may become used for compensation, this will work.

Since the array will be small if the number of variables to add is small and won't escape the scope of our method, I think there is a decent chance that it will be allocated directly on the stack by the JIT, which may make the code quite fast.

I admit that I did not fully understand why the authors of the original code handled infinities, overflows, and NaNs the way they did. Here my code deviates from the original. (I hope I did not mess it up.)

Either way, I can now sum up 3, 4, or n double numbers by doing:

public static final double add3(final double x0, final double x1,
    final double x2) {
  return __destructiveSum(new double[] { x0, x1, x2 });
}

public static final double add4(final double x0, final double x1,
    final double x2, final double x3) {
  return __destructiveSum(new double[] { x0, x1, x2, x3 });
}

If I want to sum up 3 or 4 long numbers and obtain the precise result as double, I will have to deal with the fact that doubles can only represent longs in -9007199254740992..9007199254740992L. But this can easily be done by splitting each long into two parts:

 public static final long add3(final long x0, final long x1,
    final long x2) {
  double lx;
  return __destructiveSum(new long[] {new double[] { //
                lx = x0, //
                (x0 - ((long) lx)), //
                lx = x1, //
                (x1 - ((long) lx)), //
                lx = x2, //
                (x2 - ((long) lx)), //
            });
}

public static final long add4(final long x0, final long x1,
    final long x2, final long x3) {
  double lx;
  return __destructiveSum(new long[] {new double[] { //
                lx = x0, //
                (x0 - ((long) lx)), //
                lx = x1, //
                (x1 - ((long) lx)), //
                lx = x2, //
                (x2 - ((long) lx)), //
                lx = x3, //
                (x3 - ((long) lx)), //
            });
}

I think this should be about right. At least I can now add Double.MAX_VALUE, 1, and -Double.MAX_VALUE and get 1 as result.

Thomas Weise
  • 389
  • 6
  • 13