39

Imagine you have a large array of floating point numbers, of all kinds of sizes. What is the most correct way to calculate the sum, with the least error? For example, when the array looks like this:

[1.0, 1e-10, 1e-10, ... 1e-10.0]

and you add up from left to right with a simple loop, like

sum = 0
numbers.each do |val|
    sum += val
end

whenever you add up the smaller numbers might fall below the precision threshold so the error gets bigger and bigger. As far as I know the best way is to sort the array and start adding up numbers from lowest to highest, but I am wondering if there is an even better way (faster, more precise)?

EDIT: Thanks for the answer, I now have a working code that perfectly sums up double values in Java. It is a straight port from the Python post of the winning answer. The solution passes all of my unit tests. (A longer but optimized version of this is available here Summarizer.java)

/**
 * Adds up numbers in an array with perfect precision, and in O(n).
 * 
 * @see http://code.activestate.com/recipes/393090/
 */
public class Summarizer {

    /**
     * Perfectly sums up numbers, without rounding errors (if at all possible).
     * 
     * @param values
     *            The values to sum up.
     * @return The sum.
     */
    public static double msum(double... values) {
        List<Double> partials = new ArrayList<Double>();
        for (double x : values) {
            int i = 0;
            for (double y : partials) {
                if (Math.abs(x) < Math.abs(y)) {
                    double tmp = x;
                    x = y;
                    y = tmp;
                }
                double hi = x + y;
                double lo = y - (hi - x);
                if (lo != 0.0) {
                    partials.set(i, lo);
                    ++i;
                }
                x = hi;
            }
            if (i < partials.size()) {
                partials.set(i, x);
                partials.subList(i + 1, partials.size()).clear();
            } else {
                partials.add(x);
            }
        }
        return sum(partials);
    }

    /**
     * Sums up the rest of the partial numbers which cannot be summed up without
     * loss of precision.
     */
    public static double sum(Collection<Double> values) {
        double s = 0.0;
        for (Double d : values) {
            s += d;
        }
        return s;
    }
}
martinus
  • 17,736
  • 15
  • 72
  • 92
  • Don't ignore that the result cannot possibly be more precise than the summand with the largest absolute value. – greybeard Jul 18 '19 at 06:57

5 Answers5

24

For "more precise": this recipe in the Python Cookbook has summation algorithms which keep the full precision (by keeping track of the subtotals). Code is in Python but even if you don't know Python it's clear enough to adapt to any other language.

All the details are given in this paper.

dF.
  • 74,139
  • 30
  • 130
  • 136
  • since i am not a python guy, what does partials[i:] = [x] do? – martinus Dec 26 '08 at 20:32
  • partials[i:] - slice of a list (named partials). It is a part of a list from an index i to the end of a list. aList[i:] = [x] - cut off the part of a list behind the i-th element and replace it with [x] (list containing only x) – Abgan Dec 26 '08 at 21:32
  • @martinus: `partials[i:] = [x]` replaces a slice `partials[i:]` with a single-element list `[x]`. For example: `a = [0,1,2,3,4]; assert a[2:] == [2,3,4]; a[2:] = [-1]; assert a[2:] == [-1] and a == [0,1,-1]`. – jfs Dec 26 '08 at 21:34
  • An addition: if `i` is greater or equal to the size of the list then it appends `x` element to the `partials` list. `a = [0,1]; a[10:] = [-1]; assert a == [0,1,-1]`. It is so because in this case a[10:] returns an empty slice. Thus using slice notation you can add, remove, and replace list elements – jfs Dec 26 '08 at 21:45
15

See also: Kahan summation algorithm It does not require O(n) storage but only O(1).

quant_dev
  • 6,181
  • 1
  • 34
  • 57
  • 1
    Technically this is possibly not the most accurate answer to the question, but I found it much more useful in practice. It's O(n) in time and O(1) in storage, same as the original naive code, so there's no reason not use this algorithm at the very least when summing numbers. – flodin Jun 13 '10 at 08:08
5

There are many algorithms, depending on what you want. Usually they require keeping track of the partial sums. If you keep only the the sums x[k+1] - x[k], you get Kahan algorithm. If you keep track of all the partial sums (hence yielding O(n^2) algorithm), you get @dF 's answer.

Note that additionally to your problem, summing numbers of different signs is very problematic.

Now, there are simpler recipes than keeping track of all the partial sums:

  • Sort the numbers before summing, sum all the negatives and the positives independantly. If you have sorted numbers, fine, otherwise you have O(n log n) algorithm. Sum by increasing magnitude.
  • Sum by pairs, then pairs of pairs, etc.

Personal experience shows that you usually don't need fancier things than Kahan's method.

Alexandre C.
  • 55,948
  • 11
  • 128
  • 197
0

Well, if you don't want to sort then you could simply keep the total in a variable with a type of higher precision than the individual values (e.g. use a double to keep the sum of floats, or a "quad" to keep the sum of doubles). This will impose a performance penalty, but it might be less than the cost of sorting.

Wedge
  • 19,513
  • 7
  • 48
  • 71
  • This would help for short lists, but eventually you would still have trouble. Imagine a list with a huge number of small values. After a time, the small values would fall under the precision threshold of the partial sum and be lost – Sompom Feb 18 '19 at 17:56
0

If your application relies on numeric processing search for an arbitrary precision arithmetic library, however I don't know if there are Python libraries of this kind. Of course, all depends on how many precision digits you want -- you can achieve good results with standard IEEE floating point if you use it with care.

Hernán
  • 4,527
  • 2
  • 32
  • 47