0

I'm studying Bayesian statistics and I'm trying to estimate the mean of a normal distribution given a normal prior and data which are normally distributed. I have formulas to compute the posterior distribution analitically but I want to solve it by simulation and see if results coincide.

The prior for the mean is N(160, 10). My data are 120 samples stored in a column of a Pandas dataframe and they follow a normal distribution N(256, 2301). Using formulas, my posterior distribution for the mean should be N(195, 6.57).

However when plotting the trace results do not coincide. Here is the code:

with pm.Model() as model: 
    mu = pm.Normal('parameters', mu=160, sd=50)
    observed_data = pm.Normal(
        'observed_data', mu=mu, observed=df['Cholestoral']) 
with model:
# Sample from the posterior
    trace = pm.sample(draws=300, chains=2, tune=300, 
                      discard_tuned_samples=True)

If I plot the posterior this way, I get a normal with mean=256 and very little variance. If I add to the observed_data the variance of my data, 2301, I get as a posterior a normal with mean=160.

I don't understand what's wrong with my code. Can anyone help?

letsintegreat
  • 3,328
  • 4
  • 18
  • 39
emanuele_f
  • 35
  • 1
  • 5
  • The variance of the observed data is not fitted currently and it is left hardcoded to its default value of 1 which is why you get such small variance. When you set the variance to 2301, the likelihood is too wide and you basically recover the prior. You should probably review the analytical calculation to see how is the variance handled and then try to get the code to match the analytical behaviour. What likelihood are you trying to use? – OriolAbril Jun 09 '20 at 21:14
  • I'm using as likelihood 120 samples from a normal distribution with mean 256 and variance 2301. – emanuele_f Jun 10 '20 at 14:32
  • The variance for the posterior from the formula should be (1/sigma0^2+n/sigma^2)^-1 where sigma0^2 is the variance of the prior and sigma^2 is the variance of the data. I now understand why it is giving me such results thank you, but I still don't understand why it does not work. – emanuele_f Jun 10 '20 at 14:39
  • Your likelihood is not actually a normal with mean 256 and 2301, this is what is generally called data generating truth, the likelihood is the probability of the data _conditional_ on the parameters of your model. In the code above, your model has one parameter `mu` which means that your actual likelihood is a normal with mean `mu` and sigma 1. I am guessing the likelihood you want is a normal with mean `mu` and sigma `sigma`, in which case you also need a prior for sigma (you should already have it given that you did the analytical calculation) – OriolAbril Jun 10 '20 at 14:58
  • I would also recommend reading some introductory book _with examples and code_ on bayesian statistics, one that could fit this goal is [Think Bayes](https://greenteapress.com/wp/think-bayes/), it will probably be able to explain much better the issues you are encountering which are not about pymc3 usage but seem more conceptual about what is the prior, likelihood and posterior. Also, given that you are familiar with the analytical calculations, using mathematical formulas should also help – OriolAbril Jun 11 '20 at 01:20

0 Answers0