1

I have some data that looks like this:

my data

I want to try to model this data using a Poisson Mixture Model with 2 components. As I am new to PyMC3, I used the links here: PyMC3 GMM tutorial and here: PyMC3 Mixture API to try to do this. My code is here:

with pm.Model() as model:
    lam1 = pm.Exponential('lam1', lam=1)
    lam2 = pm.Exponential('lam2', lam=1)
    pois1 = pm.Poisson.dist(mu=lam1)
    pois2 = pm.Poisson.dist(mu=lam2)
    w = pm.Dirichlet('w', a=np.array([1, 1]))
    like = pm.Mixture('like', w=w, comp_dists=[pois1, pois2], observed=data)
with model:
    trace = pm.sample(5000, n_init=10000, tune=10000, random_seed=SEED)[1000:]
with model:
    ppc_trace = pm.sample_ppc(trace, 5000, random_seed=SEED)
fig, ax = plt.subplots(figsize=(8, 6))
ax.hist(data, bins=30, normed=True,
        histtype='step', lw=2,
        label='Observed data')
ax.hist(ppc_trace['like'], bins=30, normed=True,
        histtype='step', lw=2,
        label='Posterior predictive distribution')
ax.legend(loc=1)
plt.show()

But the result is this: bad result

How do I improve the fit? I have tried fiddling with the lambdas to no avail.

ilikecats
  • 199
  • 1
  • 3
  • 17

1 Answers1

0

It looks like you are plotting the posterior of the parameters. Don't confuse this with the distribution of random samples based on the model. That said, it should be clear that the orange line is not a "fit" of the blue one but rather indicates where the parameters of your components are. I am not familiar with pymc but there should also be a way to really sample from the model instead of sampling the model. One alternative option would be to read out the maxima of the posterior (the MAP solution) and with that use numpy.random functions to sample.

Jojo
  • 427
  • 3
  • 8