Stan vs PYMC3 for Discrete Mixture Models

Question

I am studying zero-inflated count temporal data. I have built a stan model that deals with this zero-inflated data with an if statement in the model block. This is as they advise in the Stan Reference Guide. e.g.,

model { 
   for (n in 1:N) { 
      if (y[n] == 0) 
         target += log_sum_exp(bernoulli_lpmf(1 | theta), bernoulli_lpmf(0 | theta) + poisson_lpmf(y[n] | lambda));
      else
         target += bernoulli_lpmf(0 | theta) + poisson_lpmf(y[n] | lambda);
   } 
}

This if statement is clearly necessary as Stan uses NUTS as the sampler which does not deal with discrete variables (and thus we are marginalising over this discrete random variable instead of sampling from it). I have not had very much experience with pymc3 but my understanding is that it can deal with a Gibbs update step (to sample from the discrete bernoulli likelihood). Then conditioned on the zero-inflated value, it could perform a Metropolis or NUTS update for the parameters that depend on the Poisson likelihood.

My question is: Can (and if so how can) pymc3 be used in such a way to sample from the discrete zero-inflated variable with the updates to the continuous variable being performed with a NUTS update? If it can, is the performance significantly improved over the above implementation in stan (which marginalises out the discrete random variable)? Further, if pymc3 can only support a Gibbs + Metropolis update, is this change away from NUTS worth considering?

Yes, PyMC3 can block update continuous and discrete parameters to provide discrete sampling. The only problem is that it will be *slower* and *less accurate* and *less robust*. Marginalizing is almost always a win for efficiency/mixing due to the Rao-Blackwell theorem and for accuracy by working in expectation. This is explained with an example in the Stan user's guide chapter on latent discrete parameters (the change-point model is also available in PyMC3). So if you can marginalize in PyMC3 (or BUGS or JAGS), that'll be a big win for efficiency and accuracy. — Bob Carpenter, Mar 25 '19 at 15:28
Thanks very much Bob. I was not aware of the ties to Rao-Blackwell. I'll work though that to understand more. — nick, Mar 26 '19 at 08:15
Roughly speaking, the theorem says that working in expectation is more efficient. By marginalizing out the discrete parameters, you work in expectation. — Bob Carpenter, Mar 27 '19 at 17:28

Stan vs PYMC3 for Discrete Mixture Models

0 Answers0