Avoiding optimization pitfalls when modeling an ordinal predicted variable in PyMC3

Question

I am trying to model an ordinal predicted variable using PyMC3 based on the approach in chapter 23 of Doing Bayesian Data Analysis. I would like to determine a good starting value using find_MAP, but am receiving an optimization error.

The model:

import pymc3 as pm
import numpy as np
import theano
import theano.tensor as tt

# Some helper functions
def cdf(x, location=0, scale=1):
    epsilon = np.array(1e-32, dtype=theano.config.floatX)

    location = tt.cast(location, theano.config.floatX)
    scale = tt.cast(scale, theano.config.floatX)

    div = tt.sqrt(2 * scale ** 2 + epsilon)
    div = tt.cast(div, theano.config.floatX)

    erf_arg = (x - location) / div
    return .5 * (1 + tt.erf(erf_arg + epsilon))


def percent_to_thresh(idx, vect):
    return 5 * tt.sum(vect[:idx + 1]) + 1.5


def full_thresh(thresh):
    idxs = tt.arange(thresh.shape[0] - 1)
    thresh_mod, updates = theano.scan(fn=percent_to_thresh,
                                      sequences=[idxs],
                                      non_sequences=[thresh])
    return tt.concatenate([[-1 * np.inf, 1.5], thresh_mod, [6.5, np.inf]])


def compute_ps(thresh, location, scale):
    f_thresh = full_thresh(thresh)
    return cdf(f_thresh[1:], location, scale) - cdf(f_thresh[:-1], location, scale)

# Generate data
real_ps = [0.05, 0.05, 0.1, 0.1, 0.2, 0.3, 0.2]
data = np.random.choice(7, size=1000, p=real_ps)

# Run model
with pm.Model() as model:
    mu = pm.Normal('mu', mu=4, sd=3)
    sigma = pm.Uniform('sigma', lower=0.1, upper=70)
    thresh = pm.Dirichlet('thresh', a=np.ones(5))

    cat_p = compute_ps(thresh, mu, sigma)

    results = pm.Categorical('results', p=cat_p, observed=data)

with model:
    start = pm.find_MAP()
    trace = pm.sample(2000, start=start)

When running this, I receive the following error:

    Applied interval-transform to sigma and added transformed sigma_interval_ to model.
Applied stickbreaking-transform to thresh and added transformed thresh_stickbreaking_ to model.
Traceback (most recent call last):
  File "cm_net_log.v1-for_so.py", line 53, in <module>
    start = pm.find_MAP()
  File "/usr/local/lib/python3.5/site-packages/pymc3/tuning/starting.py", line 133, in find_MAP
    specific_errors)
ValueError: Optimization error: max, logp or dlogp at max have non-finite values. Some values may be outside of distribution support. max: {'thresh_stickbreaking_': array([-1.04298465, -0.48661088, -0.84326554, -0.44833646]), 'sigma_interval_': array(-2.220446049250313e-16), 'mu': array(7.68422528308479)} logp: array(-3506.530143064723) dlogp: array([  1.61013190e-06,              nan,  -6.73994118e-06,
        -6.93873894e-06,   6.03358122e-06,   3.18954680e-06])Check that 1) you don't have hierarchical parameters, these will lead to points with infinite density. 2) your distribution logp's are properly specified. Specific issues:

My questions:

How can I determine why dlogp is nan at certain points?
Is there a different way that I can express this model to avoid dlogp being nan?

Also worth noting:

This model runs fine if I don't find_MAP and use a Metropolis sampler. However, I'd like to have the flexibility of using other samplers as this model becomes more complex.
I have a suspicion that the issue is due to the relationship between the thresholds and the normal distribution, but I don't know how to disentangle them for the optimization.

Jordi · Answer 1 · 2017-04-18T14:09:08.353

Regarding question 2: I expressed the model for the ordinal predicted variable (single group) differently; I used the Theano @as_op decorator for a function that calculates probabilities for the outcomes. That also explains why I cannot use find_MAP() or gradient based samplers: Theano cannot calculate a gradient for the custom function. (http://pymc-devs.github.io/pymc3/notebooks/getting_started.html#Arbitrary-deterministics)

# Number of outcomes
nYlevels = df.Y.cat.categories.size

thresh = [k + .5 for k in range(1, nYlevels)]
thresh_obs = np.ma.asarray(thresh)
thresh_obs[1:-1] = np.ma.masked

@as_op(itypes=[tt.dvector, tt.dscalar, tt.dscalar], otypes=[tt.dvector])
def outcome_probabilities(theta, mu, sigma):
    out = np.empty(nYlevels)
    n = norm(loc=mu, scale=sigma)       
    out[0] = n.cdf(theta[0])        
    out[1] = np.max([0, n.cdf(theta[1]) - n.cdf(theta[0])])
    out[2] = np.max([0, n.cdf(theta[2]) - n.cdf(theta[1])])
    out[3] = np.max([0, n.cdf(theta[3]) - n.cdf(theta[2])])
    out[4] = np.max([0, n.cdf(theta[4]) - n.cdf(theta[3])])
    out[5] = np.max([0, n.cdf(theta[5]) - n.cdf(theta[4])])
    out[6] = 1 - n.cdf(theta[5])
    return out

with pm.Model() as ordinal_model_single:    

    theta = pm.Normal('theta', mu=thresh, tau=np.repeat(.5**2, len(thresh)),
                      shape=len(thresh), observed=thresh_obs, testval=thresh[1:-1])

    mu = pm.Normal('mu', mu=nYlevels/2.0, tau=1.0/(nYlevels**2))
    sigma = pm.Uniform('sigma', nYlevels/1000.0, nYlevels*10.0)

    pr = outcome_probabilities(theta, mu, sigma)

    y = pm.Categorical('y', pr, observed=df.Y.cat.codes.as_matrix())

http://nbviewer.jupyter.org/github/JWarmenhoven/DBDA-python/blob/master/Notebooks/Chapter%2023.ipynb

Avoiding optimization pitfalls when modeling an ordinal predicted variable in PyMC3

1 Answers1