PyMC3, NUTS sampler, what's happening here?

Question

Can someone point me to the docs that will explain what I'm seeing?

Pink stuff in a Jupyter notebook makes me think something is wrong.

Using PyMC3 (btw, it's an exercise for a class and I have no idea what I'm doing).

I plugged in the numbers, initially got an error about 0s on the diagonal, swapped alpha_est and rate_est to be 1/alpha_est and 1/rate_est (and stopped getting the error), but I still get the pink stuff.

This code came with the exercise:

# An initial guess for the gamma distribution's alpha and beta
# parameters can be made as described here: 
# https://wiki.analytica.com/index.php?title=Gamma_distribution

alpha_est = np.mean(no_insurance)**2 / np.var(no_insurance)
beta_est = np.var(no_insurance) / np.mean(no_insurance)

# PyMC3 Gamma seems to use rate = 1/beta
rate_est = 1/beta_est
# Initial parameter estimates we'll use below
alpha_est, rate_est

And then the code I'm supposed to add:

Should the pink stuff make me nervous or do I just say "No errors, move on"?

=======

The "zero problem"

---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pymc3/parallel_sampling.py", line 110, in run
    self._start_loop()
  File "/Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pymc3/parallel_sampling.py", line 160, in _start_loop
    point, stats = self._compute_point()
  File "/Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pymc3/parallel_sampling.py", line 191, in _compute_point
    point, stats = self._step_method.step(self._point)
  File "/Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pymc3/step_methods/arraystep.py", line 247, in step
    apoint, stats = self.astep(array)
  File "/Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pymc3/step_methods/hmc/base_hmc.py", line 130, in astep
    self.potential.raise_ok(self._logp_dlogp_func._ordering.vmap)
  File "/Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pymc3/step_methods/hmc/quadpotential.py", line 231, in raise_ok
    raise ValueError('\n'.join(errmsg))
ValueError: Mass matrix contains zeros on the diagonal. 
The derivative of RV `alpha__log__`.ravel()[0] is zero.
"""

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
ValueError: Mass matrix contains zeros on the diagonal. 
The derivative of RV `alpha__log__`.ravel()[0] is zero.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-14-36f8e5cebbe5> in <module>
     13     g = pm.Gamma('g', alpha=alpha_, beta=rate_, observed=no_insurance)
     14 
---> 15     trace = pm.sample(10000)

/Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, **kwargs)
    435             _print_step_hierarchy(step)
    436             try:
--> 437                 trace = _mp_sample(**sample_args)
    438             except pickle.PickleError:
    439                 _log.warning("Could not pickle model, sampling singlethreaded.")

/Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pymc3/sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, **kwargs)
    967         try:
    968             with sampler:
--> 969                 for draw in sampler:
    970                     trace = traces[draw.chain - chain]
    971                     if (trace.supports_sampler_stats

/Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pymc3/parallel_sampling.py in __iter__(self)
    391 
    392         while self._active:
--> 393             draw = ProcessAdapter.recv_draw(self._active)
    394             proc, is_last, draw, tuning, stats, warns = draw
    395             if self._progress is not None:

/Local/Users/vlb/anaconda3/lib/python3.7/site-packages/pymc3/parallel_sampling.py in recv_draw(processes, timeout)
    295             else:
    296                 error = RuntimeError("Chain %s failed." % proc.chain)
--> 297             raise error from old_error
    298         elif msg[0] == "writing_done":
    299             proc._readable = True

RuntimeError: Chain 0 failed.

is the "hint" in the instructions here telling me I should use 1/rate_est?

You are now going to create your own PyMC3 model!

Use an exponential prior for alpha. Call this stochastic variable alpha_.
Similarly, use an exponential prior for the rate ( 1/ ) parameter in PyMC3's Gamma.
Call this stochastic variable rate_ (but it will be supplied as pm.Gamma's beta parameter). Hint: to set up a prior with an exponential distribution for where you have an initial estimate for of 0 , use a scale parameter of 1/0 .
Create your Gamma distribution with your alpha_ and rate_ stochastic variables and the observed data.
Perform 10000 draws.

The zero problem could be because you are sampling zeros from exponential distribution.

Ah:

rate_est is 0.00021265346963636103

rate_ci = np.percentile(trace['rate_'], [2.5, 97.5]) rate_ci = [0.00022031, 0.00028109]

1/rate_est is 4702.486170152818

I can believe I am sampling zeros if I use rate_est.

I read https://discourse.pymc.io/t/model-diagnostics-for-mass-matrix-contains-zeros-on-the-diagonal/957/8 -- it's not in a language I understand. :-( — Vicki B, Sep 30 '19 at 02:51
I discovered that leaving alpha_est alone and only doing 1/rate_est also ran without errors. — Vicki B, Sep 30 '19 at 18:04

score 2 · Accepted Answer · answered Sep 30 '19 at 01:05

2

I have doubts about your 1/alpha step. See this discussion: https://discourse.pymc.io/t/help-with-fitting-gamma-distribution/2630

The zero problem could be because you are sampling zeros from exponential distribution.

You could look here: https://docs.pymc.io/notebooks/PyMC3_tips_and_heuristic.html cell[6]

I think you are okay with the sampler output. You can check your distributions by using traceplot.

answered Sep 30 '19 at 01:05

dgumo

1,838
1
14
18

Thanks. I wish I understood what you were saying, but I think i means the pink stuff is OK. – Vicki B Sep 30 '19 at 03:03
@VickiB Do not accept or upvote answers which did not satisfyingly help you. That way you might attract somebody who is prepared to explain more on your terms. – Yunnosch Sep 30 '19 at 05:15
1

@VickiB feel free to unaccept the answer. I will try to write a better answer for you. And you should not feel stupid :-) – dgumo Sep 30 '19 at 06:33
> The zero problem could be because you are sampling zeros from exponential distribution. --- ahah. I think so, yes. I do not need to invert alpha (although i doesn;t hurt anything). I do need to invert rate. Apparently, rae (not inverted) occasionally throws a 0. – Vicki B Sep 30 '19 at 23:29

PyMC3, NUTS sampler, what's happening here?

1 Answers1