3

I've just started with Julia and I am trying to do some simple statistics.

I'm using the StatsBase package and am trying to calculate quantiles.

using StatsBase

lst = 1:10

print(nquantile(lst, 4))

and get

[1.0, 3.25, 5.5, 7.75, 10.0]

Where I assume Q_1 = 3.25 and Q_2 = 7.75

Running a similar code on python:

from statistics import quantiles

lst = [_ for _ in range(1, 11)]
print(quantiles(lst))

yields:

[2.75, 5.5, 8.25]

Where Q_1 = 2.75 and Q_3 = 8.25.

According to my understanding of statistics, pythons results correspond to what the actual math is.

So, What I am guessing is that the Julia variant is using some kind of gaussian distribution to find the quantiles. If so, is there a way to make this follow uniform distribution?

  • There are many ways to calculate quantiles. Your guess about underlying distributions is wrong. Have you looked at the help pages? https://docs.julialang.org/en/v1/stdlib/Statistics/ Wikipedia also provides [a helpful table](https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample) (linked on the Julia help page). – Gregor Thomas Sep 13 '22 at 13:55
  • (The Python `statistics` package offers 2 methods with a pretty reductive explanation in the docs. Julia and R offer more options and nicer references. If you don't feel like reading the paper cited in the Julia and R docs, I'd recommend the wikipedia link above and maybe the [R documentation link](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/quantile) to better understand what's going on. More options are also available in [numpy.quantile](https://numpy.org/doc/stable/reference/generated/numpy.quantile.html).) – Gregor Thomas Sep 13 '22 at 13:59
  • 1
    `numpy`, Julia, and R all have the same default which is referred to a "linear" in the `numpy` docs. The `statistics.quantiles` function is labeled as "Weibull" in the `numpy` docs. – Gregor Thomas Sep 13 '22 at 14:02
  • Oops, I see now I was using the `Statistics.quantile` Julia function, not your `StatsBase.nquantile`, which doesn't seem to have the method options. I'd suggest using `Statistics.quantile` directly. – Gregor Thomas Sep 13 '22 at 14:14

1 Answers1

4

There are many quantile definitions and Julia implements all options found in Hyndman, R.J and Fan, Y. (1996) Sample Quantiles in Statistical Packages", The American Statistician, Vol. 50, No. 4, pp. 361-365

In order to get the Python equivalent do:

julia> quantile(1:10, (0:4)/4; alpha=0,beta=0)
5-element Vector{Float64}:
  1.0
  2.75
  5.5
  8.25
 10.0

Explanation (found in docs):

help?> nquantile
(...)
  Equivalent to quantile(x, [0:n]/n). 
(...)
help?> quantile

  quantile(itr, p; sorted=false, alpha::Real=1.0, beta::Real=alpha)
(...)
  By default (alpha = beta = 1), quantiles are computed via linear interpolation between the points ((k-1)/(n-1),
  v[k]), for k = 1:n where n = length(itr). This corresponds to Definition 7 of Hyndman and Fan (1996), and is the  same as the R and NumPy default.

(...)
    •  Def. 6: alpha=0, beta=0 (Excel PERCENTILE.EXC, Python default, Stata altdef)
(...)
Przemyslaw Szufel
  • 40,002
  • 3
  • 32
  • 62