How to find skewness and kurtosis correctly in pandas?

Question

I was wondering how to calculate skewness and kurtosis correctly in pandas. Pandas gives some values for skew() and kurtosis() values but they seem much different from scipy.stats values. Which one to trust pandas or scipy.stats?

Here is my code:

import numpy as np
import scipy.stats as stats
import pandas as pd

np.random.seed(100)
x = np.random.normal(size=(20))

kurtosis_scipy = stats.kurtosis(x)
kurtosis_pandas = pd.DataFrame(x).kurtosis()[0]

print(kurtosis_scipy, kurtosis_pandas)
# -0.5270409758168872
# -0.31467107631025604

skew_scipy = stats.skew(x)
skew_pandas = pd.DataFrame(x).skew()[0]

print(skew_scipy, skew_pandas)
# -0.41070929017558555
# -0.44478877631598901

Versions:

print(np.__version__, pd.__version__, scipy.__version__)
1.11.0 0.20.0 0.19.0

Also related: https://stackoverflow.com/questions/50276138/how-is-pandas-kurtosis-defined/50277666#50277666 — ALollz, Jun 25 '19 at 16:19

score 8 · Accepted Answer · answered Jun 25 '19 at 16:14

8

`bias=False`

print(
    stats.kurtosis(x, bias=False), pd.DataFrame(x).kurtosis()[0],
    stats.skew(x, bias=False), pd.DataFrame(x).skew()[0],
    sep='\n'
)

-0.31467107631025515
-0.31467107631025604
-0.4447887763159889
-0.444788776315989

answered Jun 25 '19 at 16:14

piRSquared

285,575
57
475
624

BhishanPoudel · Answer 2 · 2021-04-19T19:56:42.770

Pandas calculate UNBIASED estimator of the population kurtosis. Look at the Wikipedia for formulas: https://www.wikiwand.com/en/Kurtosis

Calculate kurtosis from scratch

import numpy as np
import pandas as pd
import scipy

x = np.array([0, 3, 4, 1, 2, 3, 0, 2, 1, 3, 2, 0,
              2, 2, 3, 2, 5, 2, 3, 999])
xbar = np.mean(x)
n = x.size
k2 = x.var(ddof=1) # default numpy is biased, ddof = 0
sum_term = ((x-xbar)**4).sum()
factor = (n+1) * n / (n-1) / (n-2) / (n-3)
second = - 3 * (n-1) * (n-1) / (n-2) / (n-3)

first = factor * sum_term / k2 / k2

G2 = first + second
G2 # 19.998428728659768

Calculate kurtosis using numpy/scipy

scipy.stats.kurtosis(x,bias=False) # 19.998428728659757

Calculate kurtosis using pandas

pd.DataFrame(x).kurtosis() # 19.998429

Similarly, you can also calculate skewness.

What are the `xbar` and `n` variables in the "from scratch" version? — BrunoF, Apr 19 '21 at 14:12
@BrunoFacca `xbar` is the sample mean and `n` is the sample size. I have updated the code. — BhishanPoudel, Apr 19 '21 at 19:58

How to find skewness and kurtosis correctly in pandas?

2 Answers2

bias=False

Calculate kurtosis from scratch

Calculate kurtosis using numpy/scipy

Calculate kurtosis using pandas

`bias=False`