Calculating Population Standard Deviation in R

Question

Looking for a way to calculate Population Standard Deviation in R -- using greater than 10 samples. Unable to extract the source C code in R to find the method of calculation.

# Sample Standard Deviation 
# Note: All the below match with 10 or less samples
n <- 10 # 10 or greater it shifts calculation
set.seed(1)
x <- rnorm(n, 10)

# Sample Standard Deviation
sd(x)
# [1] 0.780586
sqrt(sum((x - mean(x))^2)/(n - 1))
# [1] 0.780586
sqrt(sum(x^2 - 2*mean(x)*x + mean(x)^2)/(n - 1)) # # Would like the Population Standard Deviation equivalent using this.
# [1] 0.780586
sqrt( (n/(n-1)) * ( ( (sum(x^2)/(n)) ) - (sum(x)/n) ^2 ) )
# [1] 0.780586

Now, the Population Standard Deviation needs to match sd(x) with 100 count.

# Population Standard Deviation 
n <- 100 
set.seed(1)
x <- rnorm(x, 10)

sd(x)
# [1] 0.780586

sqrt(sum((x - mean(x))^2)/(n))
# [1] 0.2341758

sqrt(sum(x^2 - 2*mean(x)*x + mean(x)^2)/(n)) 
# [1] 0.2341758

# Got this to work above using (eventual goal, to fix the below):
# https://en.wikipedia.org/wiki/Algebraic_formula_for_the_variance
sqrt( (n/(n-1)) * ( ( (sum(x^2)/(n)) ) - (sum(x)/n) ^2 ) )  # Would like the Population Standard Deviation equivalent using this.
# [1] 3.064027

G. Grothendieck · Answer 1 · 2017-06-06T14:52:54.297

11

Please check the question. The first argument of rnorm should be n.

The population and sample standard deviations are:

sqrt((n-1)/n) * sd(x) # pop
## [1] 0.8936971

sd(x) # sample
## [1] 0.8981994

They can also be calculated like this:

library(sqldf)
library(RH2)

sqldf("select stddev_pop(x), stddev_samp(x) from X")
##   STDDEV_POP("x") STDDEV_SAMP("x")
## 1       0.8936971        0.8981994

Note: We used this test data:

set.seed(1)
n <- 100
x <- rnorm(n)
X <- data.frame(x)

edited Jun 06 '17 at 14:52

answered Jun 03 '17 at 01:02

G. Grothendieck

254,981
17
203
341

Thanks for the help. Looks like it's possible to calculate without relying on mean/var/sd -- see below. – eyeOfTheStorm Jun 03 '17 at 01:43
2

but why would you want to ... ?? – Ben Bolker Jun 03 '17 at 01:44
Thanks for the RH2 example – eyeOfTheStorm Jun 03 '17 at 02:05
@BenBolker For using `Sufficient Statistics` – eyeOfTheStorm Jun 03 '17 at 02:06
hmm, I'm not sure what that means. – Ben Bolker Jun 03 '17 at 02:08
For fixed n, the population standard deviation is a scalar multiple of the sample standard deviation so they both represent the same partitioning of the sample space henace are equivalent from a sufficient statistics viewpoint. – G. Grothendieck Jun 03 '17 at 14:33

score 5 · Answer 2 · edited Feb 24 '21 at 20:02

5

I think that the easiest way is to just define it quickly from sd:

sd.p=function(x){sd(x)*sqrt((length(x)-1)/length(x))}

edited Feb 24 '21 at 20:02

Melanie Palen

2,645
6
31
50

answered Feb 24 '21 at 10:52

Bernhard

51
1
1

eyeOfTheStorm · Answer 3 · 2017-06-03T02:20:11.090

## Sample Standard Deviation  
n <- 10 # Sample count
set.seed(1)
x <- rnorm(n, 10)

sd(x) # Correct 
# [1] 0.780586
sqrt(sum((x - mean(x))^2)/(n - 1)) # Correct 
# [1] 0.780586
sqrt(sum(x^2 - 2*mean(x)*x + mean(x)^2)/(n - 1)) # Correct 
# [1] 0.780586
sqrt( (n/(n-1)) * ( ( (sum(x^2)/(n)) ) - (sum(x)/n) ^2 ) ) # Correct 
# [1] 0.780586
sqrt((sum(x^2) - (sum(x)^2/n))/(n-1)) # Correct 
# [1] 0.780586
sqrt( (n/(n - 1)) * ( (sum(x^2)/(n))  - (sum(x)/n) ^2 ) ) # Correct 
# [1] 0.780586


## Population Standard Deviation  
n <- 100 # Note: 10 or greater biases var() and sd()
set.seed(1)
x <- rnorm(n, 10)

sd(x) # Incorrect Population Standard Deviation!!
# [1] 0.8981994
sqrt(sum((x - mean(x))^2)/(n)) # Correct
# [1] 0.8936971
sqrt(sum(x^2 - 2*mean(x)*x + mean(x)^2)/(n)) # Correct 
# [1] 0.8936971
sqrt((sum(x^2) - (sum(x)^2/n))/(n)) # Correct
# [1] 0.8936971
sqrt( (n/(n)) * ( (sum(x^2)/(n))  - (sum(x)/n) ^2 ) ) # Correct 
# [1] 0.8936971

score 2 · Answer 4 · answered Nov 30 '19 at 22:12

I have just spent considerable amount of time looking for a package with a ready function for population standard deviation. These are the results:

1) radiant.data::sdpop should be a good function (see documentation)

2) multicon::popsd also works well, but check the documentation to understand what the second argument is

3) muStat::stdev with the unbiased=FALSE does not work properly. On the github page it seems that in 2012 someone set it to be sd(x)*(1-1/length(x)) instead of sd(x)*sqrt(1-1/length(x))...

4) rfml::sd.pop will not work without ml.data.frame (MarkLogic Server)

I hope this helps.

Calculating Population Standard Deviation in R

4 Answers4

Linked