3

I saw this algorithm in an answer to this question.

Does this correctly calculate standard deviation? Can someone walk me through why this works mathematically? Preferably working back from this formula:

enter image description here

public class Statistics {

    private int n;
    private double sum;
    private double sumsq;

    public void reset() {
        this.n = 0;
        this.sum = 0.0;
        this.sumsq = 0.0;
    }

    public synchronized void addValue(double x) {
        ++this.n;
        this.sum += x;
        this.sumsq += x*x;
    }

    public synchronized double calculateMean() {
        double mean = 0.0;
        if (this.n > 0) {
            mean = this.sum/this.n;
        }
        return mean;
    }

    public synchronized double calculateStandardDeviation() {
        double deviation = 0.0;
        if (this.n > 1) {
            deviation = Math.sqrt((this.sumsq - this.sum*this.sum/this.n)/(this.n-1));
        }
        return deviation;
    }
}
Community
  • 1
  • 1
kingbob939
  • 63
  • 7

2 Answers2

2

There is a proof on wikipedia at the start of the section I linked to.

enter image description here

By the way, I remember from somewhere that calculating this way can produce more error. As you can see this.sumsq can become huge. Whereas calculating the normal way always has smaller intermediate values.

Anyway, I do use this online calculation a lot, because most of the time error didn't matter that much.

Apiwat Chantawibul
  • 1,271
  • 1
  • 10
  • 20
  • Bingo. If you do start caring about numerical stability, you might work out a two-step update; to get (k+1)Vark+1 from k Vark, add (xk+1 - muk)2 then work out an update so that you get the sum of squares of differences from the new mean instead of the old. – tmyklebu Jun 02 '13 at 05:39
  • Not sure how I missed this on the wikipedia page, but this totally make sense. Thanks! – kingbob939 Jun 02 '13 at 20:14
0

I believe population standard deviation would substitute N-1 for N in that formula, because there's one degree of freedom less when the mean is given. I'm not a statistician, so I don't have the proof.

The formula is correct - standard deviation is the square root of the mean variance.

duffymo
  • 305,152
  • 44
  • 369
  • 561