Incorporating uncertainty into a pymc3 model

Question

I have a set of data for which I have the mean, standard deviation and number of observations for each point (i.e., I have knowledge regarding the accuracy of the measure). In a traditional pymc3 model where I look only at the means, I may do something along the lines of:

x = data['mean']

with pm.Model() as m:
    a = pm.Normal('a', mu=0, sd=1)
    b = pm.Normal('b', mu=1, sd=1)
    y = a + b*x

    eps= pm.HalfNormal('eps', sd=1)
    likelihood = pm.Normal('likelihood', mu=y, sd=eps, observed=x)

What is the best way to incorporate the information regarding the variance of the observations into the model? Obviously the result should weight low-variance observations more heavily than high-variance (less certain) observations.

One approach a statistician suggested was to do the following:

x = data['mean']   # mean of observation
x_sd = data['sd']  # sd of observation
x_n = data['n']    # of measures for observation
x_sem = x_sd/np.sqrt(x_n)

with pm.Model() as m:
    a = pm.Normal('a', mu=0, sd=1)
    b = pm.Normal('b', mu=1, sd=1)
    y = a + b*x

    eps = pm.HalfNormal('eps', sd=1)
    obs = mc.Normal('obs', mu=x, sd=x_sem, shape=len(x))
    likelihood = pm.Normal('likelihood', mu=y, eps=eps, observed=obs)

However, when I run this I get:

TypeError: observed needs to be data but got: <class 'pymc3.model.FreeRV'>

I am running the master branch of pymc3 (3.0 has some performance issues resulting in very slow sample times).

If I understand correctly, what you are looking at is known as an "errors in variables" model. A web search for `Bayesian errors in variables` turns up some hits. Do you have to use pymc3? Consider working with examples in JAGS or Stan in order to get an understanding of the problem. — Robert Dodier, May 23 '17 at 21:23

aloctavodia · Accepted Answer · 2017-05-24T10:50:39.643

You are close, you just need to make some small changes. The main reason is that for PyMC3 data is always constant. Check the following code:

with pm.Model() as m:
    a = pm.Normal('a', mu=0, sd=1)
    b = pm.Normal('b', mu=1, sd=1)
    mu = a + b*x

    mu_est = pm.Normal('mu_est', mu, x_sem, shape=len(x))

    likelihood = pm.Normal('likelihood', mu=mu_est, sd=x_sd, observed=x)

Notice than I keep the data fixed and I introduce the observed uncertainty at two points: for the estimation of mu_est and for the likelihood. Of course you are free to do not use x_sem or x_sd and instead estimate them, like you did in your code with the variable eps.

On a historical note, code with "random data" used to work on PyMC3 (at least for some models), but given that it was not really designed to work that way, developers decided to prevent the user from using random data, and that explains the message you got.

Incorporating uncertainty into a pymc3 model

1 Answers1