Sampling from a bounded domain zipf distribution

Question

I'd like to sample from "zipf" distribution from a bounded domain.

That is, assume that the domain is {1,...,N}, I'd like each element in the domain, i, to be chosen with probability proportional to i ** -a, where a is a parameter of the distribution.

numpy provides a zipf sampler (numpy.random.zipf), but it does not allow me to restrict the domain.

How can I easily sample from such distribution?

If the distribution parameter, a, is larger than 1, I can use the numpy sampler by rejecting (and re-sampling) all samples larger than N. However, since it does not restrict the sample range, trying to use any smaller values of a does not work.

When the domain is finite, there shouldn't be a problem to use such as, and that is what I need for my application.

score 7 · Accepted Answer · answered Oct 25 '15 at 15:08

Using scipy.stats, you could create a custom discrete distribution:

bounded_zipf = stats.rv_discrete(name='bounded_zipf', values=(x, weights))

For example,

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

N = 7
x = np.arange(1, N+1)
a = 1.1
weights = x ** (-a)
weights /= weights.sum()
bounded_zipf = stats.rv_discrete(name='bounded_zipf', values=(x, weights))

sample = bounded_zipf.rvs(size=10000)
plt.hist(sample, bins=np.arange(1, N+2))
plt.show()

yields

score 0 · Answer 2 · answered Oct 27 '15 at 20:20

0

If sampling performance is an issue, you could implement your own sampling method based on rejection-inversion sampling. You will find a corresponding Java implementation here.

answered Oct 27 '15 at 20:20

otmar

386
1
9

Sampling from a bounded domain zipf distribution

2 Answers2