5

I've been following the Gaussian mixture model example for PyMC3 here: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/gaussian_mixture_model.ipynb and have got it working nicely with an artificial dataset. enter image description here

I've tried it with a real dataset, and the i'm struggling to get it to give sensible results : enter image description here

Any ideas of which parameters I should be looking to narrow/widen/change in order to get a better fit? The traces seem to be stable. Here's a snippet of my model which I've adjusted from the example:

model = pm.Model()
with model:
    # cluster sizes
    a = pm.constant(np.array([1., 1., 1.]))
    p = pm.Dirichlet('p', a=a, shape=k)
    # ensure all clusters have some points
    p_min_potential = pm.Potential('p_min_potential', tt.switch(tt.min(p) < .1, -np.inf, 0))


    # cluster centers
    means = pm.Normal('means', mu=[0, 1.5, 3], sd=1, shape=k)
    # break symmetry
    order_means_potential = pm.Potential('order_means_potential',
                                     tt.switch(means[1]-means[0] < 0, -np.inf, 0)
                                     + tt.switch(means[2]-means[1] < 0, -np.inf, 0))

    # measurement error
    sd = pm.Uniform('sd', lower=0, upper=2, shape=k)

    # latent cluster of each observation
    category = pm.Categorical('category', p=p, shape=ndata)

    # likelihood for each observed value
    points = pm.Normal('obs', mu=means[category], sd=sd[category], observed=data)
Anjum Sayed
  • 872
  • 9
  • 20
  • It's worth checking out Section 4.2 of https://www.amazon.com/Data-Analysis-Bayesian-Devinderjit-Sivia/dp/0198568320 - such an excellent book – jtlz2 Aug 08 '18 at 09:09
  • What priors were you using? It's worth trying to look at the posterior distribution. If you have overlapping prior ranges the posterior could be multimodal and then you can get trapped in a local maximum. You can also get swapping degeneracies/shadowing :( – jtlz2 Aug 08 '18 at 11:19

1 Answers1

3

It turns out that there is an excellent blog article on this topic here: http://austinrochford.com/posts/2016-02-25-density-estimation-dpm.html

Anjum Sayed
  • 872
  • 9
  • 20