6

Using python's Pandas library, the Dataframe.describe() function prints the standard deviation of the dataset. However, the documentation page doesn't specify whether this standard deviation is the "uncorrected" standard deviation or the "corrected" standard deviation.

Can someone tell me which one it returns?

hlin117
  • 20,764
  • 31
  • 72
  • 93

2 Answers2

8

It's the corrected sample standard deviation.
You can convince yourself of this with a simple Series and applying the formulae:

In [11]: s = pd.Series([1, 2])

In [12]: s.std()
Out[12]: 0.70710678118654757

In [13]: from math import sqrt
   ....:  sqrt(0.5)
Out[13]: 0.7071067811865476

and the formula for corrected sample standard deviation:

In [14]: sqrt(1./(len(s)-1) * ((s - s.mean()) ** 2).sum())
Out[14]: 0.7071067811865476
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
4

DataFrame.describe() calls Series.std() to get the standard deviation. And as the documentation tells us,

Return unbiased standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

So the standard deviation returned by describe() is, in fact, the "corrected sample standard deviation".

Carsten
  • 17,991
  • 4
  • 48
  • 53