12

Say you have 100000000 32-bit floating point values in an array, and each of these floats has a value between 0.0 and 1.0. If you tried to sum them all up like this

result = 0.0;
for (i = 0; i < 100000000; i++) {
    result += array[i];
}

you'd run into problems as result gets much larger than 1.0.

So what are some of the ways to more accurately perform the summation?

mskfisher
  • 3,291
  • 4
  • 35
  • 48
splicer
  • 5,344
  • 4
  • 42
  • 47
  • 2
    why do you expect the result to be smaller than 1? I'm confused! – lexu Mar 16 '10 at 16:53
  • I think he's saying that *once* the result passes 1.0 the problems start to arise. *What* problems I don't know, but that's how I took it. – Adam Robinson Mar 16 '10 at 16:54
  • In Python, use `math.fsum` (http://docs.python.org/library/math.html#math.fsum). – kennytm Mar 16 '10 at 16:59
  • 1
    I think from the sample code we can assume it's not Python. – Ken Mar 16 '10 at 17:01
  • @splicer: Can you be more specific - what 'problems' do you mean? – ire_and_curses Mar 16 '10 at 17:03
  • I'm trying to avoid accumulating numerical errors. The Wikipedia article that Daniel Pryden's answer links to gives a good explanation of the problem. – splicer Mar 16 '10 at 18:06
  • It seems to me there will still be a good deal of loss of precision, assuming the floats represent something (and hence can't be considered exact). With 1E8 floats, I'd expect the error to be about 1E4 times the average error, which means lots of the significant digits would be accumulated fuzz. – David Thornley Mar 16 '10 at 18:59

5 Answers5

30

Sounds like you want to use Kahan Summation.

According to Wikipedia,

The Kahan summation algorithm (also known as compensated summation) significantly reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers, compared to the obvious approach. This is done by keeping a separate running compensation (a variable to accumulate small errors).

In pseudocode, the algorithm is:

function kahanSum(input)
 var sum = input[1]
 var c = 0.0          //A running compensation for lost low-order bits.
 for i = 2 to input.length
  y = input[i] - c    //So far, so good: c is zero.
  t = sum + y         //Alas, sum is big, y small, so low-order digits of y are lost.
  c = (t - sum) - y   //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
  sum = t             //Algebraically, c should always be zero. Beware eagerly optimising compilers!
 next i               //Next time around, the lost low part will be added to y in a fresh attempt.
return sum
Community
  • 1
  • 1
Daniel Pryden
  • 59,486
  • 16
  • 97
  • 135
  • I am told that you must be careful about compiler optimizations, which may perform rearrangements of operations and assume associativity rules that are not correct in underflow situations. You may need to look at the intermediate code or assembly to verify. The lines of code that the compiler could break are: "t = sum + y" and "c = (t - sum) - y". With infinite precision arithmetic, (t - sum) would exactly equal y and c would always be zero. – Paul Chernoch May 15 '12 at 14:25
  • @PaulChernoch: Yes, certain compiler optimizations could break this (one of the comments even points out: "Beware eagerly optimizing compilers!"). On gcc, it should be fine unless you use `--ffast-math`. (This flag intentionally breaks some of the guarantees that IEEE-754 provides, so AFAIK it's never turned on unless you ask for it explicitly). AFAIK, no compiler by default performs optimizations that assume infinite-precision arithmetic, precisely because these kinds of operations would be broken. – Daniel Pryden May 16 '12 at 00:25
1

Make result a double, assuming C or C++.

i_am_jorf
  • 53,608
  • 15
  • 131
  • 222
Tuomas Pelkonen
  • 7,783
  • 2
  • 31
  • 32
  • Yes, that will help, but what if you have far more than 100000000 values to sum? My choice of 100000000 for this question was arbitrary. – splicer Mar 16 '10 at 18:09
1

If you can tolerate a little extra space (in Java):

float temp = new float[1000000];
float temp2 = new float[1000];
float sum = 0.0f;
for (i=0 ; i<1000000000 ; i++) temp[i/1000] += array[i];
for (i=0 ; i<1000000 ; i++) temp2[i/1000] += temp[i];
for (i=0 ; i<1000 ; i++) sum += temp2[i];

Standard divide-and-conquer algorithm, basically. This only works if the numbers are randomly scattered; it won't work if the first half billion numbers are 1e-12 and the second half billion are much larger.

But before doing any of that, one might just accumulate the result in a double. That'll help a lot.

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
0

If in .NET using the LINQ .Sum() extension method that exists on an IEnumerable. Then it would just be:

var result = array.Sum();
Rob Packwood
  • 3,698
  • 4
  • 32
  • 48
0

The absolutely optimal way is to use a priority queue, in the following way:

PriorityQueue<Float> q = new PriorityQueue<Float>();
for(float x : list) q.add(x);
while(q.size() > 1) q.add(q.pop() + q.pop());
return q.pop();

(this code assumes the numbers are positive; generally the queue should be ordered by absolute value)

Explanation: given a list of numbers, to add them up as precisely as possible you should strive to make the numbers close, t.i. eliminate the difference between small and big ones. That's why you want to add up the two smallest numbers, thus increasing the minimal value of the list, decreasing the difference between the minimum and maximum in the list and reducing the problem size by 1.

Unfortunately I have no idea about how this can be vectorized, considering that you're using OpenCL. But I am almost sure that it can be. You might take a look at the book on vector algorithms, it is surprising how powerful they actually are: Vector Models for Data-Parallel Computing

jkff
  • 17,623
  • 5
  • 53
  • 85
  • 2
    Actually this not an optimal solution. You want to minimize the absolute value of the intermediate results, which does not necessarily mean that you should always add smallest numbers first. For example, if you want to sum [1.01, -0.001, -1.02, 0.0012], it is best to express it as (0.0012 - 0.001) + (1.01 - 1.02). – quant_dev Apr 20 '10 at 05:11