What's a good way to add a large number of small floats together?

Question

Say you have 100000000 32-bit floating point values in an array, and each of these floats has a value between 0.0 and 1.0. If you tried to sum them all up like this

result = 0.0;
for (i = 0; i < 100000000; i++) {
    result += array[i];
}

you'd run into problems as result gets much larger than 1.0.

So what are some of the ways to more accurately perform the summation?

why do you expect the result to be smaller than 1? I'm confused! — lexu, Mar 16 '10 at 16:53
I think he's saying that *once* the result passes 1.0 the problems start to arise. *What* problems I don't know, but that's how I took it. — Adam Robinson, Mar 16 '10 at 16:54
In Python, use `math.fsum` (http://docs.python.org/library/math.html#math.fsum). — kennytm, Mar 16 '10 at 16:59
@splicer: Can you be more specific - what 'problems' do you mean? — ire_and_curses, Mar 16 '10 at 17:03
I'm trying to avoid accumulating numerical errors. The Wikipedia article that Daniel Pryden's answer links to gives a good explanation of the problem. — splicer, Mar 16 '10 at 18:06
It seems to me there will still be a good deal of loss of precision, assuming the floats represent something (and hence can't be considered exact). With 1E8 floats, I'd expect the error to be about 1E4 times the average error, which means lots of the significant digits would be accumulated fuzz. — David Thornley, Mar 16 '10 at 18:59

score 30 · Accepted Answer · edited Jun 20 '20 at 09:12

30

Sounds like you want to use Kahan Summation.

According to Wikipedia,

The Kahan summation algorithm (also known as compensated summation) significantly reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers, compared to the obvious approach. This is done by keeping a separate running compensation (a variable to accumulate small errors).

In pseudocode, the algorithm is:
function kahanSum(input)
 var sum = input[1]
 var c = 0.0          //A running compensation for lost low-order bits.
 for i = 2 to input.length
  y = input[i] - c    //So far, so good: c is zero.
  t = sum + y         //Alas, sum is big, y small, so low-order digits of y are lost.
  c = (t - sum) - y   //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
  sum = t             //Algebraically, c should always be zero. Beware eagerly optimising compilers!
 next i               //Next time around, the lost low part will be added to y in a fresh attempt.
return sum

edited Jun 20 '20 at 09:12

Community

1
1

answered Mar 16 '10 at 16:58

Daniel Pryden

59,486
16
97
135

I am told that you must be careful about compiler optimizations, which may perform rearrangements of operations and assume associativity rules that are not correct in underflow situations. You may need to look at the intermediate code or assembly to verify. The lines of code that the compiler could break are: "t = sum + y" and "c = (t - sum) - y". With infinite precision arithmetic, (t - sum) would exactly equal y and c would always be zero. – Paul Chernoch May 15 '12 at 14:25
@PaulChernoch: Yes, certain compiler optimizations could break this (one of the comments even points out: "Beware eagerly optimizing compilers!"). On gcc, it should be fine unless you use `--ffast-math`. (This flag intentionally breaks some of the guarantees that IEEE-754 provides, so AFAIK it's never turned on unless you ask for it explicitly). AFAIK, no compiler by default performs optimizations that assume infinite-precision arithmetic, precisely because these kinds of operations would be broken. – Daniel Pryden May 16 '12 at 00:25

score 1 · Answer 2 · edited Mar 16 '10 at 16:53

1

Make result a double, assuming C or C++.

edited Mar 16 '10 at 16:53

i_am_jorf

53,608
15
131
222

answered Mar 16 '10 at 16:52

Tuomas Pelkonen

7,783
2
31
32

Yes, that will help, but what if you have far more than 100000000 values to sum? My choice of 100000000 for this question was arbitrary. – splicer Mar 16 '10 at 18:09

score 1 · Answer 3 · answered Mar 16 '10 at 17:30

If you can tolerate a little extra space (in Java):

float temp = new float[1000000];
float temp2 = new float[1000];
float sum = 0.0f;
for (i=0 ; i<1000000000 ; i++) temp[i/1000] += array[i];
for (i=0 ; i<1000000 ; i++) temp2[i/1000] += temp[i];
for (i=0 ; i<1000 ; i++) sum += temp2[i];

Standard divide-and-conquer algorithm, basically. This only works if the numbers are randomly scattered; it won't work if the first half billion numbers are 1e-12 and the second half billion are much larger.

But before doing any of that, one might just accumulate the result in a double. That'll help a lot.

score 0 · Answer 4 · answered Mar 16 '10 at 16:57

0

If in .NET using the LINQ .Sum() extension method that exists on an IEnumerable. Then it would just be:

var result = array.Sum();

answered Mar 16 '10 at 16:57

Rob Packwood

3,698
4
32
48

Thanks, but I should be more specific: I'm working in C and OpenCL. – splicer Mar 16 '10 at 18:12
This also doesn't really address the error accumulation issue either. – recursive May 09 '12 at 19:19

score 0 · Answer 5 · answered Mar 17 '10 at 08:28

The absolutely optimal way is to use a priority queue, in the following way:

PriorityQueue<Float> q = new PriorityQueue<Float>();
for(float x : list) q.add(x);
while(q.size() > 1) q.add(q.pop() + q.pop());
return q.pop();

(this code assumes the numbers are positive; generally the queue should be ordered by absolute value)

Explanation: given a list of numbers, to add them up as precisely as possible you should strive to make the numbers close, t.i. eliminate the difference between small and big ones. That's why you want to add up the two smallest numbers, thus increasing the minimal value of the list, decreasing the difference between the minimum and maximum in the list and reducing the problem size by 1.

Unfortunately I have no idea about how this can be vectorized, considering that you're using OpenCL. But I am almost sure that it can be. You might take a look at the book on vector algorithms, it is surprising how powerful they actually are: Vector Models for Data-Parallel Computing

Actually this not an optimal solution. You want to minimize the absolute value of the intermediate results, which does not necessarily mean that you should always add smallest numbers first. For example, if you want to sum [1.01, -0.001, -1.02, 0.0012], it is best to express it as (0.0012 - 0.001) + (1.01 - 1.02). — quant_dev, Apr 20 '10 at 05:11

What's a good way to add a large number of small floats together?

5 Answers5

Linked