3

In this post:

How does pandas calculate quartiles?

This is the explanation given by @perl on the working of quantile() function:

df = pd.DataFrame([5,7,10,15,19,21,21,22,22,23,23,23,23,23,24,24,24,24,25], columns=['val'])

Let's consider 0.25 (same logic with 0.75, of course): element number should be (len(df)-1)*0.25 = (19 - 1)*0.25 = 4.5, so we're between element 4 (which is 19 -- we start counting from 0) and element 5 (which is 21). So, we have i = 19, j = 21, fraction = 0.5, and i + (j - i) * fraction = 20

I am still not able to figure out how quantile() function works.

All the formulas for quantiles suggest that we should take q * (n+1), where q is the quantile to be calculated. However, in the explanation by @perl, the formula used is q*(n-1). Why (n-1) instead of (n+1) ?

Secondly, why is the fraction 0.5 being used by @perl?

Is there any difference in the method of quantile calculation, if the total data points are even or odd?*

if we take two data frames:

df1 = pd.DataFrame([2,4,6,8,10,12]) # n=6 (even)

df2 = pd.DataFrame([1,3,5,7,9]) # n=5 (odd)

their respective quantiles are as under (pic attached)quantile chart:

I am unable to find out how the quantiles are being calculated in the above two cases.

q -> df1 -> df2

0.2 -> 4.0 -> 2.6

0.25 -> 4.5 -> 3.0

0.5 -> 7.0 -> 5.0

0.75 -> 9.5 -> 7.0

0.8 -> 10.0 -> 7.4

Can someone explain please ? I will be highly thankful.

Thanks in advance.

Vineet

Community
  • 1
  • 1
vineet
  • 31
  • 4

1 Answers1

0

I am not sure but you can try this.

0 <= q <= 1

df = pd.DataFrame([1,3,5,7,9], columns=['val'])

df.quantile(0.25)

output: val 3.0

Explanation: n=5, q = 0.25. As i have used q = 0.25,then we can use index = n/4 = 1.25

Condition for index:

  • if index decimal fraction like 0.25 < 0.50, then index = floor(index)
  • if index decimal fraction > 0.50, then index = ceil(index)
  • if index decimal fraction == 0.50, then value = int(index)+0.5
Mamun Or Rashid
  • 845
  • 7
  • 7