How can I calculate the kurtosis of already binned data?

Question

Does anyone know how to calculate the kurtosis of a distribution from binned data alone using Python?

I have a histogram of a distribution, but not the raw data. There are two columns; one with the bin number and one with the count number. I need to calculate the kurtosis of the distribution.

If I had the raw data, I could use the scipy function to calculate kurtosis. I can't see anything within this documentation to calculate using binned data. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html

The binned statistics option with scipy allows you to calculate the kurtosis within a bin, but only using raw data and just within bins. https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.binned_statistic.html

Edit: Example data. I could try and resample from this to create my own dummy raw data, but I have about 140k of these to run each day and was hoping for something built-in.

Index,Bin,Count
 0, 730, 30
 1, 735, 45
 2, 740, 41
 3, 745, 62
 4, 750, 80
 5, 755, 96
 6, 760, 94
 7, 765, 90
 8, 770, 103
 9, 775, 96
10, 780, 95
11, 785, 109
12, 790, 102
13, 795, 99
14, 800, 93
15, 805, 101
16, 810, 109
17, 815, 98
18, 820, 89
19, 825, 62
20, 830, 71
21, 835, 69
22, 840, 58
23, 845, 50
24, 850, 42

A "bin" usually has a left end and a right end. Does your "bin number" correspond to one of the ends of the interval associated with each count? — Warren Weckesser, Feb 01 '19 at 05:28
@WarrenWeckesser The bin in the example data corresponds to the central value of the bin. So 800 represents the range 797.5 to 802.5 — KWx, Feb 05 '19 at 01:06

score 2 · Accepted Answer · answered Jan 29 '19 at 22:57

You can just calculate the statistics directly. If x is your bin numbers, and y is the counts for each bin, then the expected value of f(x) is equal to np.sum(y*f(x))/np.sum(y). We can use this to translate the formula for kurtosis into the following code:

total = np.sum(y)
mean = np.sum(y * x) / total
variance = np.sum(y * (x - mean)**2) / total
kurtosis = np.sum(y * (x - mean)**4) / (variance**2 * total)

Note that kurtosis and excess kurtosis are not the same thing.

Thanks. That does make sense. I'll do some research to see if I can implement that using groupby. — KWx, Feb 01 '19 at 02:44

How can I calculate the kurtosis of already binned data?

1 Answers1