How to decide on what priors distributions to use for parameters in PyMC3?

Question

I am looking into PyMC3 package where I was interested in implementing the package in a scenario where I have several different signals and each signal has different amplitudes.

However, I am stuck on what type of priors I would need to use in order to implement PyMC3 into it and likelihood distribution to implement. Scenario example is shown in the following image:

Signal plot

I tried to implement it here, but, every time I keep on getting the error:

pymc3.exceptions.SamplingError: Bad initial energy

My Code

## Signal 1:
    with pm.Model() as model:
        # Parameters:
        # Prior Distributions:
        # BoundedNormal = pm.Bound(pm.Exponential, lower=0.0, upper=np.inf)
        # c = BoundedNormal('c', lam=10)
        # c = pm.Uniform('c', lower=0, upper=300)
        alpha = pm.Normal('alpha', mu = 0, sd = 10)
        beta = pm.Normal('beta', mu = 0, sd = 1)
        sigma = pm.HalfNormal('sigma', sd = 1)
        mu = pm.Normal('mu', mu=0, sigma=1)
        sd = pm.HalfNormal('sd', sigma=1)

        # Observed data is from a Multinomial distribution:
        # Likelihood distributions:
        # bradford = pm.DensityDist('observed_data', logp=bradford_logp, observed=dict(value=S1, loc=mu, scale=sd, c=c))
        # observed_data = pm.Beta('observed_data', mu=mu, sd=sd, observed=S1)
        observed_data = pm.Beta('observed_data', alpha=alpha, beta=beta, mu=mu, sd=sd, observed=S1)

    with model:
        # obtain starting values via MAP
        startvals = pm.find_MAP(model=model)

        # instantiate sampler
        # step = pm.Metropolis()
        step = pm.HamiltonianMC()
        # step = pm.NUTS()

        # draw 5000 posterior samples
        trace = pm.sample(start=startvals, draws=1000, step=step, tune=500, chains=4, cores=1, discard_tuned_samples=True)

        # Obtaining Posterior Predictive Sampling:
        post_pred = pm.sample_posterior_predictive(trace, samples=500)
        print(post_pred['observed_data'].shape)

    plt.title('Trace Plot of Signal 1')
    pm.traceplot(trace, var_names=['mu', 'sd'], divergences=None, combined=True)
    plt.show(block=False)
    plt.pause(5)  # Pauses the program for 5 seconds
    plt.close('all')

    pm.plot_posterior(trace, var_names=['mu', 'sd'])
    plt.title('Posterior Plot of Signal 1')
    plt.show(block=False)
    plt.pause(5)  # Pauses the program for 5 seconds
    plt.close('all')

Side questions

I also have been looking into the idea of implementing goodness to fitness test and Kalman filter while using different distributions other than Gaussian distribution, so, if you have the time I would appreciate it if can you have a look at them?. Both questions can be found here:

Goodness-to-fit tests link: Goodness-to-fit test

Kalman filter link: Kalman Filter

Edit 1

Say I have around 5 signals and would like to implement the Bayesian interface in order to see the difference in the PDF of the signals. How do I approach this problem? Do I need to create multiple models and get their posterior distribution? Like in this image:

Distribution plots

If I need to get the posterior distribution, Do I use the following code?

# Obtaining Posterior Predictive Sampling:
post_pred = pm.sample_posterior_predictive(trace, samples=500)

Edit 2

If I have multiple signals can I implement it this way in order to see changes in alpha and beta in all signals?

        observed_data_S1 = pm.Beta('observed_data_S1', alpha=alpha[0], beta=beta[0], observed=S1[0])
        observed_data_S2 = pm.Beta('observed_data_S2', alpha=alpha[1], beta=beta[1], observed=S2[0])
        observed_data_S3 = pm.Beta('observed_data_S3', alpha=alpha[2], beta=beta[2], observed=S3[0])
        observed_data_S4 = pm.Beta('observed_data_S4', alpha=alpha[3], beta=beta[3], observed=S4[0])
        observed_data_S5 = pm.Beta('observed_data_S5', alpha=alpha[4], beta=beta[4], observed=S5[0])
        observed_data_S6 = pm.Beta('observed_data_S6', alpha=alpha[5], beta=beta[5], observed=S6[0])

Edit 3:

how can I plot multiple traces in one plot? because I am looking at multiple signals and thought of combing all alphas and betas together in one plot.

this error is a result of log_likelihood taking `np.inf` or `np.nan` values somewhere in between your model. can you please run model.get_test_vals()? This will give you the full report of log_prob values taken by each free tv in your model. Once you find which variable takes ``np.inf`` or ``np.nan`` values, you can traceback to remove or correct that rv's distribution. — Tirth Patel, May 08 '20 at 10:07

Tirth Patel · Accepted Answer · 2020-05-08T10:41:16.900

4

First Mistake: Beta distribution's parameters alpha and beta must be positive. You have used a Normal prior on them which allows that RV to take negative and 0 values. You can easily fix that by using pm.Bound on pm.Normal distribution or using pm.HalfNormal distribution instead.

Second Mistake: The other inconsistency is specifying mu and sigma along with alpha and beta parameters. Beta either accepts mu and sigma or alpha and beta but not both. The default behaviour is to use alpha and beta parameters over mu and sigma parameters. You are wasting a lot of computational power inferring mu and sigma out.

Other Comments: You should not use sd parameter in any distribution as of version 3.8 as it has been deprecated and will be removed in 3.9. Use sigma instead.

Corrected Version:

import numpy as np
import theano
import theano.tensor as tt
import pymc3 as pm
import matplotlib.pyplot as plt

S1 = np.random.rand(10)

## Signal 1:
with pm.Model() as model:
    # Parameters:
    # Prior Distributions:
    # BoundedNormal = pm.Bound(pm.Exponential, lower=0.0, upper=np.inf)
    # c = BoundedNormal('c', lam=10)
    # c = pm.Uniform('c', lower=0, upper=300)
    alpha = pm.HalfNormal('alpha', sigma=10)
    beta = pm.HalfNormal('beta', sigma=1)

    # Observed data is from a Multinomial distribution:
    # Likelihood distributions:
    # bradford = pm.DensityDist('observed_data', logp=bradford_logp, observed=dict(value=S1, loc=mu, scale=sd, c=c))
    # observed_data = pm.Beta('observed_data', mu=mu, sd=sd, observed=S1)
    observed_data = pm.Beta('observed_data', alpha=alpha, beta=beta, observed=S1)

with model:
    # obtain starting values via MAP
    startvals = pm.find_MAP(model=model)

    # instantiate sampler
    # step = pm.Metropolis()
    step = pm.HamiltonianMC()
    # step = pm.NUTS()

    # draw 5000 posterior samples
    trace = pm.sample(start=startvals, draws=1000, step=step, tune=500, chains=4, cores=1, discard_tuned_samples=True)

    # Obtaining Posterior Predictive Sampling:
    post_pred = pm.sample_posterior_predictive(trace, samples=500)
    print(post_pred['observed_data'].shape)

plt.title('Trace Plot of Signal 1')
pm.traceplot(trace, var_names=['alpha', 'beta'], divergences=None, combined=True)
plt.show(block=False)
plt.pause(5)  # Pauses the program for 5 seconds
plt.close('all')

pm.plot_posterior(trace, var_names=['alpha', 'beta'])
plt.title('Posterior Plot of Signal 1')
plt.show(block=False)
plt.pause(5)  # Pauses the program for 5 seconds
plt.close('all')

edited May 08 '20 at 10:41

answered May 08 '20 at 10:24

Tirth Patel

1,855
3
13
22

Thank you for your answer with explanation, it seems to work and I have a question. It seems to run the samples based on the number of chains, however, it seems to take a while with each chain. Is there a way in order to decrease the time it takes to run each chain? – WDpad159 May 08 '20 at 14:34
increase number of cores to sample in parallel. Do `trace = pm.sample(start=startvals, draws=1000, step=step, tune=500, chains=4, discard_tuned_samples=True)`. This will use all the cores by default – Tirth Patel May 08 '20 at 14:36
When I change the number of cores, it doesn't run the chains and the program just finishes the code. – WDpad159 May 08 '20 at 15:05
I also have a question, say I have around 5 signals and would like to implement the bayesian interface in order to see the difference in the pdf of the signals. How do I approach this problem? Do I need to create multiple models and get their posterior distribution? Also, side question, Can I assume that bayesian interface behave similarly to the concept of the 1D Kalman filter where the bayesian interface can be called bayesian filter? – WDpad159 May 08 '20 at 15:19
Which os are you using? I works on my Linux machine – Tirth Patel May 08 '20 at 15:59
I am using Windows and implementing the code in Pycharm – WDpad159 May 08 '20 at 16:01
can you please report this on https://github.com/pymc-devs/pymc3/issues/new. You will get a better support there. – Tirth Patel May 08 '20 at 16:05
I will ask them about it. Also, did you have a look at my question in the last comment – WDpad159 May 08 '20 at 16:47
I am not so comfortable with Kalman Filters so you may want to ask it there as well – Tirth Patel May 08 '20 at 16:55
Okay, what about this question: "say I have around 5 signals and would like to implement the bayesian interface in order to see the difference in the pdf of the signals. How do I approach this problem? Do I need to create multiple models and get their posterior distribution?" – WDpad159 May 08 '20 at 17:04
I have provided a solution on the github issue tracker. Let's continue the discussion there. – Tirth Patel May 08 '20 at 17:26
While I agree the standard parameterization of a Beta is `alpha`, `beta`, the `mu`, `sd` one is usually more interpretable and can often sample more efficiently due to less correlation between the parameters. – merv May 09 '20 at 05:27
1

@merv What I meant was either specify alpha, beta or mu, sigma. I agree that the mu, sigma parameterization is more interpretable but passing them along with alpha and beta will lead to an inconsistent state. Sorry for not being clear! – Tirth Patel May 09 '20 at 05:33
Link to my question in Pymc3: https://discourse.pymc.io/t/how-to-decide-on-what-priors-distributions-to-use-for-parameters/5052/2 – WDpad159 May 09 '20 at 08:37

How to decide on what priors distributions to use for parameters in PyMC3?

My Code

Side questions

Edit 1

Edit 2

1 Answers1