9

In short: how can I execute a+b such that any loss-of-precision due to truncation is away from zero rather than toward zero?

The Long Story

I'm computing the sum of a long series of floating point values for the purpose of computing the sample mean and variance of the set. Since Var(X) = E(X2) - E(X)2, it suffices to maintain running count of all numbers, the sum of all numbers so far, and the sum of the squares of all numbers so far.

So far so good.

However, it's absolutely required that E(X2) > E(X)2, which due to floating point accuracy isn't always the case. In pseudo-code, the problem is this:

int count;
double sum, sumOfSquares;
...
double value = <current-value>;
double sqrVal = value*value; 

count++;
sum += value; //slightly rounded down since value is truncated to fit into sum
sumOfSquares += sqrVal; //rounded down MORE since the order-of-magnitude 
//difference between sqrVal and sumOfSquares is twice that between value and sum;

For variable sequences, this isn't a big issue - you end up slightly under-estimating the variance, but it's often not a big issue. However, for constant or almost-constant sets with a non-zero mean, it can mean that E(X2) < E(X)2, resulting in a negative computed variance, which violates expectations of consuming code.

Now, I know about Kahan Summation, which isn't an attractive solution. Firstly, it makes the code susceptible to optimization vagaries (depending on optimization flags, code may or may not exhibit this problem), and secondly, the problem isn't really due to the precision - which is good enough - it's because addition introduces systematic error towards zero. If I could execute the line

sumOfSquares += sqrVal;

in such a way as to ensure that sqrVal is rounded up, not down, into the precision of sumOfSquares, I'd have a numerically reasonable solution. But how can I achieve that?

Edit: Finished question - why does pressing enter in the drop-down-list in the tag field submit the question anyhow?

Eamon Nerbonne
  • 47,023
  • 20
  • 101
  • 166
  • Will sorting the set and doing the same computations starting from smaller values and proceedng to greater values change the situation? – sharptooth Aug 10 '09 at 08:25
  • Sorting is far more computationally expensive O(n log(n)) time and O(n) storage rather than linear time (and much lower constant factor) and constant storage for the current solution. The data set I'm dealing with is arbitrarily large (the larger the better), so this is a problem. – Eamon Nerbonne Aug 10 '09 at 08:42
  • No doubt in that, but will this help in the first place? – sharptooth Aug 10 '09 at 08:44
  • It won't help since the problem can occur even in completely constant sets - as long as sum-of-squares and sum have difference truncation errors (which they will), an unfortunate series can result in negative variance. – Eamon Nerbonne Aug 10 '09 at 09:12

3 Answers3

6

IEEE provides four rounding modes, (toward -inf, toward +inf, toward 0, tonearest). Toward +inf is what you seem to want. There is no standard control in C90 or C++. C99 added the header <fenv.h> which is also present as an extension in some C90 and C++ implementation. To respect the C99 standard, you'd have to write something like:

#include <fenv.h>
#pragma STDC FENV_ACCESS ON

int old_round_mode = fegetround();
int set_round_ok = fesetround(FE_UPWARD);
assert(set_round_ok == 0);
...
int set_round_ok = fesetround(old_round_mode);
assert(set_round_ok == 0);

It is well known that the algorithm you use is numerically unstable and has precision problem. It is better for precision to do two passes on the data.

RBerteig
  • 41,948
  • 7
  • 88
  • 128
AProgrammer
  • 51,233
  • 8
  • 91
  • 143
  • Uses two passes is really unfortunate due to performance issues (it also makes the API uglier). As far as I can tell, the algorithm should be stable if you just round up - right? – Eamon Nerbonne Aug 10 '09 at 08:47
  • I wonder, would something like "sumOfSquares += sqrVal + sumOfSquares/(1L << 52)" be likely to be stable? – Eamon Nerbonne Aug 10 '09 at 09:00
  • @Eamon, about your first question, I've no time to do a real stability analysis. Especially that I don't do that often enough do be confident in the result. The code in your second comment doesn't seem equivalent at all (did you intend to divide sqrVal instead?, in that case, scaling doesn't change the stability nor the precision). – AProgrammer Aug 10 '09 at 09:12
  • No, I intended sumOfSquares. Motivation: double's have 52 bits of precision, so the 53rd bit is a potential source of error. To ensure that the estimate is only ever to high and not too low, I can simply add that 53rd bit as well. Presumably sqlVal is small enough to include the bit, and then I'm sure that any rounding error is safely below the 1/2^52 threshold. – Eamon Nerbonne Aug 11 '09 at 07:47
6

There's another single-pass algorithm which rearranges the calculation a bit. In pseudocode:

n = 0
mean = 0
M2 = 0

for x in data:
    n = n + 1
    delta = x - mean
    mean = mean + delta/n
    M2 = M2 + delta*(x - mean)  # This expression uses the new value of mean

variance_n = M2/n         # Sample variance
variance = M2/(n - 1)     # Unbiased estimate of population variance

(Source: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance )

This seems better behaved with respect to the issues you pointed out with the usual algorithm.

Jim Lewis
  • 43,505
  • 7
  • 82
  • 96
2

If you don't worry about the precision, but just about a negative variance, why don't you simply do V(x) = Max(0, E(X^2) - E(X)^2)

erikkallen
  • 33,800
  • 13
  • 85
  • 120
  • That was my initial workaround, but I hoped to tap stackoverflow's overflowing wisdom for a better one. It's a pragmatic solution - should probably have mentioned it ;-). – Eamon Nerbonne Aug 11 '09 at 07:44