5

Generally the formula is given as:

Davg, k = a * Davg, k – 1 + (1 – a) * Dk – 1

but while implementing it, if I do it as, just to save one floating point op,

Davg, k = a * ( Davg, k – 1 - Dk – 1 ) + Dk – 1

How much does it affect precision ? or is it drastically wrong to do it this way. I know I may have been paranoid about just saving one FP op, I am ready to implement it the theoretical way, but still I would like to understand this. Whatever details, examples you can provide that would be great. Thanks.

EDIT: Of course I understand that in the second way, I will lose precision if I subtract two very close numbers in FP, but is that the only reason of implementing it the first way ?

avd
  • 13,993
  • 32
  • 78
  • 99
  • Can you determine the impact empirically? – Brian Cain May 31 '13 at 19:40
  • Yes of course I can do it, I thought of that :) but wanted to know the standard practices regarding this. But I will try with an experiment. – avd May 31 '13 at 19:43
  • why not just using a library with an arbitrary precision ? Also the "precision" of your float depends on the given representation of the numbers on your machine, so it depends on the actual implementation, are we sure that a theoretical/analytical study is that important ? – user2384250 May 31 '13 at 19:51
  • What I've used for discrete samples at a fixed rate is `((((interval - 1.0) * oldAverage) + newValue) / interval)`. ("interval" here refers to the number of samples the average is "over", and keep in mind that interval - 1.0 is a constant and the divide can be replaced by multiplying with the reciprocal if that's presumed faster.) I think this roughly matches your first scheme. Have no idea how this stacks up to any other scheme. – Hot Licks May 31 '13 at 20:00

1 Answers1

8

It is not a problem.

First, note that 0 ≤ a < 1, so errors in the average tend to diminish, not accumulate. Incoming new data displaces old errors.

Subtracting floating-point numbers of similar magnitude (and same sign) does not lose absolute accuracy. (You wrote “precision”, but precision is the fineness with which values are represented, e.g., the width of the doubletype, and that does not change with subtraction.) Subtracting numbers of similar magnitude may cause an increase of relative error: Since the result is smaller, the error is larger relative to it. However, the relative error of an intermediate value is of no concern.

In fact, subtracting two numbers, each of which equals or exceeds half the other, has no error: The correct mathematical result is exactly representable (Sterbenz’ Lemma).

So the subtraction in the latter operation sequence is likely to be exact or low-error, depending on how much the values fluctuate. Then the multiplication and the addition have the usual rounding errors, and they are not particularly worrisome unless there are both positive and negative values, which can lead to large relative errors when the average is near zero. If a fused multiply-add operation is available (See fma in <tgmath.h>), then you can eliminate the error from the multiplication.

In the former operation sequence, the evaluation of 1-a will be exact if a is at least ½. That leaves two multiplications and one addition. This will tend to have very slightly greater error than the latter sequence, but likely not enough to notice. As before, old errors will tend to diminish.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Hi Eric, thanks a lot for your detailed answer: 1) Thanks for pointing out that errors tend to diminish since a < 1 2) Yeah I meant accuracy actually, and not precision, I always tend to use precision when I mean accuracy, should take care of it next time :) 3) and yeah, my average is not near 0, its well above that, like 82 or something so I think I dont need to use fma, right ? 4) Could you please explain the last point in more detail: "In the former operation sequence, the evaluation of 1-a will be exact if a is at least ½" I did not understand this. – avd Jun 01 '13 at 22:56
  • @avd: I am traveling and away from my reference books, but Sterbenz’ Lemma says that, in a floating-point system like IEEE-754, if x and y are finite floating-point values such that y/2 ≤ x ≤ 2y, then x–y is exactly representable. There is a formal proof but, essentially, the fact that x and y are close to each other guarantees the result has an exponent (in the floating-point encoding) less than or equal to the exponents of x and y. Therefore, it has significand bits at least as low in value as those in x and y, so it can represent the low bit of the subtraction (as well as all the others). – Eric Postpischil Jun 02 '13 at 04:04
  • @avd: You do not need `fma` if you do not not care about a tiny amount of additional error. – Eric Postpischil Jun 02 '13 at 04:05