What is the difference between this code and this formula?

Question

I am writing an algorithm and calculating the kurtosis of the distribution of daily returns. I am trying to get my calculation of kurtosis to match that of Excel's. Excel's calculation supposedly uses the formula at the top of this webpage: http://www.macroption.com/kurtosis-excel-kurt/

Here is my code used to emulate that formula (returns is a numpy array consisting of the series of daily returns):

def kurtosis(returns):
    n = len(returns)
    avg = np.average(returns)
    std = np.std(returns)
    coefficient = 1.0 * n * (n+1) / ((n-1) * (n-2) * (n-3) * std**4.0)
    term = (3 * (n-1)**2.0) / ((n-2) * (n-3))
    summation = 0

    for x in returns: 
        summation += ( (x - avg) ) ** 4.0
    kurt = coefficient * summation - term

    return kurt

Apparently there is a difference between the formula used by excel and my code... Excel gives a kurtosis of 1.94, while my code gives a value of 2.81.

Does anyone have a clue as to why the two values are different?

You might try a different `ddof` for `std`, eg. `ddof=1`. Check that function's documentation. Usually the effect of choosing population v sample dof is small, but with `s**4`, the effect will be amplified. — hpaulj, Aug 19 '15 at 02:53
You might also play with `scipy.stats.kurtosis`, and look up other `[numpy] kurtosis` SO questions. — hpaulj, Aug 19 '15 at 03:04
What is the data sample that gives the different values? If it is too big to post here, can you find a small data sample that you could post here that yields different answers? — Paul, Aug 19 '15 at 04:21
(not a solution) You can probably replace your `for` loop with `summation=np.sum((returns-avg)**4)` — yevgeniy, Aug 19 '15 at 09:49
@hpaulj I can confirm that using `std = np.std(returns, ddof = 1)` gives the same results as Excel using the data from [Microsoft's kurtosis help page](https://support.office.com/en-us/article/KURT-function-bc3a265c-5da4-4dcb-b7fd-c237789095ab). Maybe you should put your comment as an answer? — DrBwts, Aug 19 '15 at 12:20
@hpaulj thanks! Adding the ddof=1 parameter to the np.std() call was the fix! — skibeats, Aug 19 '15 at 13:36

score 0 · Accepted Answer · answered Aug 19 '15 at 15:49

0

Rewriting my comment:

Providing a ddof=1 parameter to np.std() changes its calculation from population to sample (n-1). Usually the change in std is small, but with the s**4 use, small changes in s will be amplified.

answered Aug 19 '15 at 15:49

hpaulj

221,503
14
230
353

What is the difference between this code and this formula?

1 Answers1