I have a set of data for which I have the mean, standard deviation and number of observations for each point (i.e., I have knowledge regarding the accuracy of the measure). In a traditional pymc3 model where I look only at the means, I may do something along the lines of:
x = data['mean']
with pm.Model() as m:
a = pm.Normal('a', mu=0, sd=1)
b = pm.Normal('b', mu=1, sd=1)
y = a + b*x
eps= pm.HalfNormal('eps', sd=1)
likelihood = pm.Normal('likelihood', mu=y, sd=eps, observed=x)
What is the best way to incorporate the information regarding the variance of the observations into the model? Obviously the result should weight low-variance observations more heavily than high-variance (less certain) observations.
One approach a statistician suggested was to do the following:
x = data['mean'] # mean of observation
x_sd = data['sd'] # sd of observation
x_n = data['n'] # of measures for observation
x_sem = x_sd/np.sqrt(x_n)
with pm.Model() as m:
a = pm.Normal('a', mu=0, sd=1)
b = pm.Normal('b', mu=1, sd=1)
y = a + b*x
eps = pm.HalfNormal('eps', sd=1)
obs = mc.Normal('obs', mu=x, sd=x_sem, shape=len(x))
likelihood = pm.Normal('likelihood', mu=y, eps=eps, observed=obs)
However, when I run this I get:
TypeError: observed needs to be data but got: <class 'pymc3.model.FreeRV'>
I am running the master branch of pymc3 (3.0 has some performance issues resulting in very slow sample times).