Yule–Simon distribution
In probability and statistics, the Yule–Simon distribution is a discrete probability distribution named after Udny Yule and Herbert A. Simon. Simon originally called it the Yule distribution.
Probability mass function Yule–Simon PMF on a log-log scale. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.) | |||
Cumulative distribution function Yule–Simon CMF. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.) | |||
Parameters | shape (real) | ||
---|---|---|---|
Support | |||
PMF | |||
CDF | |||
Mean | for | ||
Mode | |||
Variance | for | ||
Skewness | for | ||
Ex. kurtosis | for | ||
MGF | does not exist | ||
CF |
The probability mass function (pmf) of the Yule–Simon (ρ) distribution is
for integer and real , where is the beta function. Equivalently the pmf can be written in terms of the rising factorial as
where is the gamma function. Thus, if is an integer,
The parameter can be estimated using a fixed point algorithm.
The probability mass function f has the property that for sufficiently large k we have
This means that the tail of the Yule–Simon distribution is a realization of Zipf's law: can be used to model, for example, the relative frequency of the th most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of .