6

I'm successfully using Welford's method to compute running variance and standard deviation as described many times on Stack Overflow and John D Cook's excellent blog post.

However in the stream of samples, sometimes I encounter a "rollback", or "remove sample" order, meaning that a previous sample is no longer valid and should be removed from the calculation. I know the value of the sample to remove and when it was processed. But I'm using Welford because I can not go back do another pass over all the data.

Is there an algorithm to successfully adjust my running variance to remove or negate a specific previously processed sample?

Monospace
  • 93
  • 5

1 Answers1

5

Given the forward formulas

Mk = Mk-1 + (xk – Mk-1) / k
Sk = Sk-1 + (xk – Mk-1) * (xk – Mk),

it's possible to solve for Mk-1 as a function of Mk and xk and k:

Mk-1 = Mk - (xk - Mk) / (k - 1).

Then we can derive Sk-1 straightforwardly from Sk and the rest:

Sk-1 = Sk - (xk – Mk-1) * (xk – Mk).

It's not necessary that xk be the last sample here; since Mk and Sk theoretically do not depend on the order of the input, we can pretend that the sample to be removed was the last to be added.

I have no idea if this is stable.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120