I'm trying to use PyMC 2.3 to obtain an estimate of the parameter of a compound model. By "compound" I mean a random variable that follows a distribution whose whose parameter is another random variable. ("nested" or "hierarchical" are somtimes used to refer to this situation, but I think they are less specific and generate more confusion in this context).
Let make a specific example. The "real" data is generated from a compound distribution that is a Poisson with a parameter that is distributed as an Exponential. In plain scipy the data is generated as follows:
import numpy as np
from scipy.stats import distributions
np.random.seed(3) # for repeatability
nsamples = 1000
tau_true = 50
orig_lambda_sample = distributions.expon(scale=tau_true).rvs(nsamples)
data = distributions.poisson(orig_lambda_sample).rvs(nsamples)
I want to obtain an estimate of the model parameter tau_true
.
My approach so far in modelling this problem in PyMC is the following:
tau = pm.Uniform('tau', 0, 100)
count_rates = pm.Exponential('count_rates', beta=1/tau, size=nsamples)
counts = pm.Poisson('counts', mu=count_rates, value=data, observed=True)
Note that I use size=nsamples
to have a new stochastic variable for each sample.
Finally I run the MCMC as:
model = pm.Model([count_rates, counts, tau])
mcmc = pm.MCMC(model)
mcmc.sample(40000, 10000)
The model converges (although slowly, > 10^5 iterations) to a distribution centred around 50 (tau_true
). However it seems like an overkill to define 1000 stochastic variables to fit a single distribution with a single parameter.
Is there a better way to describe a compound model in PyMC?
PS I also tried with a more informative prior (tau = pm.Normal('tau', mu=51, tau=1/2**2)
) but the results are similar and the convergence is still slow.