I am currently using NumPy for the following task: I have a large grid of values and I need to take a multinomial sample at each point. The probability vector for the multinomial will vary from grid site to grid site, so the NumPy multinomial function doesn't quite work for me since it does all of its draws from the same distribution. Iterating over each site seems incredibly inefficient and I was wondering if there was a way to do this efficiently in NumPy. It appears that something like this is possible (and fast) if I use Theano (see this answer), but that would require quite a bit of rewriting, which I would ideally like to avoid. Can multinomial sampling be vectorized efficiently in basic NumPy?
Edit:
It is easy to modify Warren's code to allow different counts at different sites, as I have found that I need: all one needs to do is pass in the full count
array and remove the first like, like so:
import numpy as np
def multinomial_rvs(count, p):
"""
Sample from the multinomial distribution with multiple p vectors.
* count must be an (n-1)-dimensional numpy array.
* p must an n-dimensional numpy array, n >= 1. The last axis of p
holds the sequence of probabilities for a multinomial distribution.
The return value has the same shape as p.
"""
out = np.zeros(p.shape, dtype=int)
ps = p.cumsum(axis=-1)
# Conditional probabilities
with np.errstate(divide='ignore', invalid='ignore'):
condp = p / ps
condp[np.isnan(condp)] = 0.0
for i in range(p.shape[-1]-1, 0, -1):
binsample = np.random.binomial(count, condp[..., i])
out[..., i] = binsample
count -= binsample
out[..., 0] = count
return out