1

I am currently trying to do model checking with PyMC where my model is a Bernoulli model and I have a Beta prior. I want to do both a (i) gof plot as well as (ii) calculate the posterior predictive p-value.

I have got my code running with a Binomial model, but I am quite struggling to find the right way of making a Bernoulli model working. Unfortunately, there is no example anywhere that I can work with. My code looks like the following:

import pymc as mc
import numpy as np
alpha = 2
beta = 2
n = 13
yes = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,0,0])

p = mc.Beta('p',alpha,beta)
surv = mc.Bernoulli('surv',p=p,observed=True,value=yes)
surv_sim = mc.Bernoulli('surv_sim',p=p)

mc_est = mc.MCMC({'p':p,'surv':surv,'surv_sim':surv_sim})
mc_est.sample(10000,5000,2)

import matplotlib.pylab as plt
plt.hist(mc_est.surv_sim.trace(),bins=range(0,3),normed=True)
plt.figure()
plt.hist(mc_est.p.trace(),bins=100,normed=True)

mc.Matplot.gof_plot(mc_est.surv_sim.trace(), 10/13., name='surv')

#here I have issues
D = mc.discrepancy(yes, surv_sim, p.trace())
mc.Matplot.discrepancy_plot(D)

The main problem I am having is in determining the expected values for the discrepancy function. Just using p.trace() does not work here, as these are the probabilities. Somehow, I need to incorporate the sample size here, but I am struggling to do that in a similar way as I would do it for a Binomial model. I am also not quite sure, if I am doing the gof_plot correctly.

Hope someone can help me out here! Thanks!

fsociety
  • 1,791
  • 4
  • 22
  • 32

1 Answers1

0

Per the discrepancy function doc string, the parameters are:

observed : Iterable of observed values (size=(n,))
simulated : Iterable of simulated values (size=(r,n))
expected : Iterable of expected values (size=(r,) or (r,n))

So you need to correct two things:

1) modify your simulated results to have size n (e.g., 13 in your example):

surv_sim = mc.Bernoulli('surv_sim', p=p, size=n)

2) encapsulate your p.trace() with the bernoulli_expval method:

D = mc.discrepancy(yes, surv_sim.trace(), mc.bernoulli_expval(p.trace()))

(bernoulli_expval just spits back p.)

With those two changes, I get the following:

enter image description here

inversion
  • 1,304
  • 11
  • 11
  • Thanks a lot for your response. The hint with the ```size``` parameter is great! Regarding the expected value, I did not use the ```expval``` function as, as you mentioned, it only spits back ```p```. Thus, I am again unsure if with this solution, the discrepancy is doing what it should do. What it does now, is that it compares the binary outcomes of the data and simulations with the probability parameters. – fsociety Aug 18 '15 at 10:55
  • Another thought: If I extend this model to a Categorical model, I am even more unsure how to handle it. – fsociety Aug 18 '15 at 11:41
  • I'm not terribly familiar with the `discrepancy` function. In [this thread](http://stackoverflow.com/questions/30731681/goodness-of-fit-in-pymc-and-plotting-discrepancies), the one of the authors of `pymc` advises to using the `gof_plot` to produce posterior predictive plots. I don't have any insights into why. – inversion Aug 18 '15 at 18:30