Posterior Predictive Check on PyMC3 Deterministic Variable

Question

TL; DR

What's the right way to do posterior predictive checks on pm.Deterministic variables that take stochastics (rendering the deterministic also stochastic) as input?

Too Short; Didn't Understand

Say we have a pymc3 model like this:

import pymc3 as pm

with pm.Model() as model:
    # Arbitrary, trainable distributions.
    dist1 = pm.Normal("dist1", 0, 1)
    dist2 = pm.Normal("dist2", dist1, 1)

    # Arbitrary, deterministic theano math.
    val1 = pm.Deterministic("val1", arb1(dist2))

    # Arbitrary custom likelihood.
    cdist = pm.DensityDistribution("cdist", logp(val1), observed=get_data())

    # Arbitrary, deterministic theano math.
    val2 = pm.Deterministic("val2", arb2(val1))

I may be misunderstanding, but my intention is for the posteriors of dist1 and dist2 to be sampled, and for those samples to fed into the deterministic variables. Is the posterior predictive check only possible on observed random variables?

It's straightforward to get posterior predictive samples from dist2 and other random variables using pymc3.sampling.sample_ppc, but the majority of my model's value is derived from the state of val1 and val2, given those samples.

The problem arises in that pm.Deterministic(.) seems to return a th.TensorVariable. So, when this is called:

ppc = pm.sample_ppc(_trace, vars=[val1, val2])["val1", "val2"]

...and pymc3 attempts this block of code in pymc3.sampling:

    410        for var in vars:
--> 411            ppc[var.name].append(var.distribution.random(point=param,
    412                                                          size=size))

...it complains because a th.TensorVariable obviously doesn't have a .distribution.

So, what is the right way to carry the posterior samples of stochastics through deterministics? Do I need to explicitly create a th.function that takes stochastic posterior samples and calculates the deterministic values? That seems silly given the fact that pymc3 already has the graph in place.

azane · Accepted Answer · 2017-02-13T22:36:59.857

0

Yes, I was misunderstanding the purpose of .sample_ppc. You don't need it for unobserved variables because those have samples in the trace. Observed variables aren't sampled from, because their data is observed, thus you need sample_ppc to generate samples.

In short, I can gather samples of the pm.Deterministic variables from the trace.

edited Feb 13 '17 at 22:36

answered Feb 13 '17 at 21:36

azane

73
5

Posterior Predictive Check on PyMC3 Deterministic Variable

1 Answers1