In this post:
How does pandas calculate quartiles?
This is the explanation given by @perl on the working of quantile() function:
df = pd.DataFrame([5,7,10,15,19,21,21,22,22,23,23,23,23,23,24,24,24,24,25], columns=['val'])
Let's consider 0.25 (same logic with 0.75, of course): element number should be (len(df)-1)*0.25 = (19 - 1)*0.25 = 4.5
, so we're between element 4 (which is 19 -- we start counting from 0) and element 5 (which is 21). So, we have i = 19, j = 21, fraction = 0.5, and i + (j - i) * fraction = 20
I am still not able to figure out how quantile() function works.
All the formulas for quantiles suggest that we should take q * (n+1), where q is the quantile to be calculated. However, in the explanation by @perl, the formula used is q*(n-1). Why (n-1) instead of (n+1) ?
Secondly, why is the fraction 0.5 being used by @perl?
Is there any difference in the method of quantile calculation, if the total data points are even or odd?*
if we take two data frames:
df1 = pd.DataFrame([2,4,6,8,10,12]) # n=6 (even)
df2 = pd.DataFrame([1,3,5,7,9]) # n=5 (odd)
their respective quantiles are as under (pic attached)quantile chart:
I am unable to find out how the quantiles are being calculated in the above two cases.
q -> df1 -> df2
0.2 -> 4.0 -> 2.6
0.25 -> 4.5 -> 3.0
0.5 -> 7.0 -> 5.0
0.75 -> 9.5 -> 7.0
0.8 -> 10.0 -> 7.4
Can someone explain please ? I will be highly thankful.
Thanks in advance.
Vineet