4

I'm working through a book called Bayesian Analysis in Python. The book focuses heavily on the package PyMC3 but is a little vague on the theory behind it so I'm quite confused.

Say I have data like this:

data = np.array([51.06, 55.12, 53.73, 50.24, 52.05, 56.40, 48.45, 52.34, 55.65, 51.49, 51.86, 63.43, 53.00, 56.09, 51.93, 52.31, 52.33, 57.48, 57.44, 55.14, 53.93, 54.62, 56.09, 68.58, 51.36, 55.47, 50.73, 51.94, 54.95, 50.39, 52.91, 51.5, 52.68, 47.72, 49.73, 51.82, 54.99, 52.84, 53.19, 54.52, 51.46, 53.73, 51.61, 49.81, 52.42, 54.3, 53.84, 53.16]) 

And I'm looking at a model like this:

enter image description here

Using Metropolis Sampling, how can I fit a model that estimates mu and sigma.

Here is my guess at pseudo-code from what I've read:

M, S = 50, 1
G = 1

# These are priors right?
mu = stats.norm(loc=M, scale=S)
sigma = stats.halfnorm(scale=G)

target = stats.norm

steps = 1000

mu_samples = [50]
sigma_samples = [1]

for i in range(steps):
    # proposed sample...
    mu_i, sigma_i = mu.rvs(), sigma.rvs()

    # Something happens here
    # How do I calculate the likelidhood??
    "..."
    # some evaluation of a likelihood ratio??

    a = "some"/"ratio"

    acceptance_bar = np.random.random()

    if a > acceptance_bar:
        mu_samples.append(mu_i)
        sigma_samples.append(sigma_i)

What am I missing??

RSHAP
  • 2,337
  • 3
  • 28
  • 39

1 Answers1

2

I hope the following example helps you. In this example I am going to assume we know the value of sigma, so we only have a prior for mu.

sigma = data.std() # we are assuming we know sigma

steps = 1000
mu_old = data.mean() # initial value, just a good guest
mu_samples = []

# we evaluate the prior for the initial point
prior_old = stats.norm(M, S).pdf(mu_old)
# we evaluate the likelihood for the initial point
likelihood_old = np.prod(stats.norm(mu_old, sigma).pdf(data))
# Bayes' theorem (omitting the denominator) for the initial point
post_old = prior_old * likelihood_old

for i in range(steps):
    # proposal distribution, propose a new value from the old one
    mu_new = stats.norm.rvs(mu_old, 0.1)

    # we evaluate the prior
    prior_new = stats.norm(M, S).pdf(mu_new)

    # we evaluate the likelihood
    likelihood_new = np.prod(stats.norm(mu_new, sigma).pdf(data))

    # Bayes' theorem (omitting the denominator)
    post_new = prior_new * likelihood_new

    # the ratio of posteriors (we do not need to know the normalizing constant)
    a =  post_new / post_old

    if np.random.random() < a:
        mu_old = mu_new
        post_old = post_new

    mu_samples.append(mu_old)

Notes:

  • Notice that I have defined a proposal distribution, which in this case is a Gaussian centered at mu_old with a standard deviation of 0.1 (an arbitrary value). In practice the efficiency of MH depends heavily on the proposal distribution, thus PyMC3 (as well as other practical implementations of MH) use some heuristic to tune the proposal distribution.
  • For simplicity I used the pdf in this example but in practice is convenient to work with the logpdf. This avoid underflow problems without changing the results.
  • The likelihood is computed as a product
  • The ratio you were missing is the ratio of posteriors
  • If you do not accept the new proposed value, then you save (again) the old value.

Remember to check this repository for the Errata and for updated versions of the code. The major difference with the updated code respect to the code in the book, is that now the preferred way to run a model with PyMC3 is to just use pm.sample() and let PyMC3 choose the samplers and initialization points for you.

aloctavodia
  • 2,040
  • 21
  • 28