5

Suppose we had a basic moving average function that was keeping track of the sum. For example:

Queue values;
double sum;

double CalcSMA(double next) {
    values.push(next);
    sum -= values.pop();
    sum += next;
    return sum / SMA_LENGTH;
}

One example of how this could break down would be if our window was 5 wide that we fed it something like: 2, 2, 2, 2, 2, 2, 2, 1E100, 2, 2, 2, 2, 2, 2, 2, 2. The output would then be 2, 2, 2, 2E99, 2E99, 2E99, 2E99, 2E99, 0, 0, 0.

Even if the sum isn't quite that dramatically off, there still could be quite reasonable instances where a small loss in precision could make the sum artificially increase by a tiny amount. Over a long period of time, this could add up and become an issue.

Does anyone have any ideas of how to work around the loss in precision?

EDIT: note that this function is designed to work O(1). That is necessary. So, recalculating each time won't work: the window is too large.

Jay
  • 983
  • 2
  • 8
  • 23
  • 2
    I don't understand your output. Shouldn't it be more like `2, 2, 2, 2E99, 2E99, 2E99, 2E99, 2E99, 0, 0, 0, 0` (where `0` stands for a number near `0`). Surely the algorithm subtracts the `1E100` once it goes out of frame, although after that things are not very accurate. You almost need a re-reading of the entire queue, a kind of recalibration. – ooga Jul 09 '15 at 00:24
  • You are correct. I fixed the question. – Jay Jul 09 '15 at 03:28
  • You can estimate the worst case error of the sum with the usual ULP/epsilon analysis and re-sum the buffer when the error becomes too large (consult a good text on numerical computation if you don't know these terms). Or you can use arbitrary precision (bignum) math. – Gene Jul 09 '15 at 03:31
  • @Gene Arbitrary precision won't work indefinitely though. Error Analysis could work though. – Jay Jul 09 '15 at 03:37
  • Actually, arbitrary precision should be fine if you implement properly. See, for example, Shewchuk's algorithm (http://code.activestate.com/recipes/393090/). Since every value eventually cancels itself out, and every value is a precise double, the expansion cannot have more elements than the window size (although it is likely to have a lot less, particularly for large windows). – rici Jul 09 '15 at 17:23
  • True, I forgot that the values cancel. But I guess the argument could then be made that worst case is O(n). (I think) – Jay Jul 10 '15 at 03:00
  • @jay: yes, the worst case is O(n), as I said in my comment. But it's *probably* a lot less, whereas recomputing the sum every time is always O(n). Anyway, Shewchuk's paper is worth a read (pdf here: http://www.cs.berkeley.edu/~jrs/papers/robustr.pdf) – rici Jul 10 '15 at 06:22

2 Answers2

4

You could recalculate a fresh sum over every SMA_LENGTH values to stop the errors accumulating:

Queue values;
double sum = 0.0;
double freshsum = 0.0;
int count = 0;

double CalcSMA(double next) {
    values.push(next);
    sum -= values.pop();
    sum += next;
    freshsum += next;
    if(++count == SMA_LENGTH)
    {
        sum = freshsum;
        freshsum = 0.0;
        count = 0;
    } 
    return sum / SMA_LENGTH;
}
samgak
  • 23,944
  • 4
  • 60
  • 82
1

What samgak proposed doesn't actually guarantee good averages if you constantly provide it with evil values.

You can use Neumaier's algorithm to generate accurate results in O(1) time. Something like this:

const double SMA_LENGTH = 5;
Queue values;
double sum = 0.0;
double correction = 0.0;

static void Neumaier(double value, ref double sum, ref double correction)
{
    var t = sum + value;
    if (Math.Abs(sum) >= Math.Abs(value))
        correction += (sum - t) + value;
    else
        correction += (value - t) + sum;
    sum = t;
}

double CalcSMA(double next)
{
    Neumaier(-values.pop(), ref sum, ref correction);
    Neumaier(next, ref sum, ref correction);
    values.push(next);
    return (sum + correction) / SMA_LENGTH;
}

If you have huge sequences, you can reset the window every 10^15 or so additions. This is because, with double precision, the algorithm starts losing accuracy after about 10^16 additions.

On the other hand, Neumaier is more complex so it is probably slower.

relatively_random
  • 4,505
  • 1
  • 26
  • 48
  • While for a fixed length sum, Neumaier or Kahan summation are more accurate, @samgak's answer has a bounded error for the moving average, your approach does not. – Joe May 08 '18 at 07:52
  • @Joe Yes, which is why I mentioned resetting the window, like samgak does. The difference is, Neumaier will actually provide accurate numbers within the given window. (Kahan isn't applicable because it assumes small elements) – relatively_random May 08 '18 at 20:25