Is there an approximation to get meanvalue and standard deviation in one loop

Question

I have a collection of n floating point values: x[n]. When I want to calculate the meanvalue and standard deviation, I need to iterate with two loops over all values:

First loop to sum all values and calculate the meanvalue:

sum = 0
for(i=0; i<n; i++)
    sum += x[i]
mean = sum/n

In a second loop I calculate the standard deviation:

sum = 0
for(i=0; i<n; i++)
    sum += pow2(x[i] - mean)
sder = sqrt(sum/n)

I am aware that you cannot reduce this complexity if you want to the exact values for meanvalue and standard deviation. But is there a way to calculate them in less time if you just approximate? Favoured in one loop.

What you have there is O(n). Do you mean you want to do it in one pass? — SirGuy, Jul 07 '16 at 16:43
O(2n) is O(n). If you're using big-O notation when you want constant factor improvements, you're probably thinking about this the wrong way. — user2357112, Jul 07 '16 at 16:46

score 3 · Accepted Answer · answered Jul 07 '16 at 16:47

3

Have a look at this section of the wiki on standard deviation, in particular the last formula leads to the following algorithm:

    sum = 0;
    sumsqrd = 0;

    for(i = 0; i < n; i++)
        sum += x[i]
        sumsqrd += x[i] * x[i]

    mean = sum / n
    stddev = sqrt(sumsqrd / n - mean * mean)

answered Jul 07 '16 at 16:47

SirGuy

10,660
2
36
66

Oh man, I should have better payed attention in maths. I didn't think it would be that easy. – RomCoo Jul 07 '16 at 17:23
3

It's worth noting that the numerical stability of this algorithm is worse than that of computing the mean first and then computing the root mean square deviation from the mean. For example, with IEEE 754 doubles, it gives a standard deviation of 0 for [1e8+1, 1e8-1]. If you were going to accept an approximation anyway, that's probably fine, but it'd be wrong to think that this algorithm doesn't have downsides. – user2357112 Jul 07 '16 at 17:32

score 3 · Answer 2 · answered Jul 11 '16 at 04:43

3

Here's a version which does the calculations in one pass, and is computationally more stable:

mean = 0.0
sum_sqrs = 0.0
n = 0

loop do
  x = get_x()
  break if x == nil
  delta = x - mean
  n += 1
  mean += delta / n
  sum_sqrs += delta * (x - mean)
end
sample_var = sum_sqrs / (n - 1)

This is based on the formulas found in the bottom half of the Rapid calculation methods section of the Wikipedia page for Standard deviation.

answered Jul 11 '16 at 04:43

pjs

18,696
4
27
56

1

@JamieMarshall May I suggest that you delete your comments since they were based on an implementation error? – pjs Jul 31 '23 at 01:24

Is there an approximation to get meanvalue and standard deviation in one loop

2 Answers2