PyMC3 Multivariate Mixture Model: Constraining components to be non-empty

Question

I am implementing a Personalized Mixture of Multivariate Gaussian Regressions in pymc3 and running into an issue with empty components. After referring to the related PyMC3 mixture model example, I tried implementing the model using univariate normals instead, but I've had some issues there as well.

I've tried several strategies to constrain each component to be non-empty, but each has failed. These are shown in the code below. My specific question is: What is the best way to constrain all components to be non-empty in a mixture of multivariate Gaussians using pymc3?

Note that attempt #1 in the code below comes from the Mixture Model in PyMC3 Example and does not work here.

You can replicate the synthetic data I am using with the function in this gist.

import pymc3 as pm
import numpy as np
import theano
import theano.tensor as T
from scipy import stats

# Extract problem dimensions.
N = X.shape[0]  # number of samples
F = X.shape[1]  # number of features
pids = I[:, 0].astype(np.int)  # primary entity ids
uniq_pids = np.unique(pids)  # array of unique primary entity ids
n_pe = len(uniq_pids)  # number of primary entities

with pm.Model() as gmreg:
    # Init hyperparameters.
    a0 = 1
    b0 = 1
    mu0 = pm.constant(np.zeros(F))
    alpha = pm.constant(np.ones(K))
    coeff_precisions = pm.constant(1 / X.var(0))

    # Init parameters.
    # Dirichlet shape parameter, prior on indicators.
    pi = pm.Dirichlet(
        'pi', a=alpha, shape=K)

    # ATTEMPT 1: Make probability of membership for each cluter >= 0.1
    # ================================================================
    pi_min_potential = pm.Potential(
        'pi_min_potential', T.switch(T.min(pi) < .1, -np.inf, 0))
    # ================================================================

    # The multinomial (and by extension, the Categorical), is a symmetric
    # distribution. Using this as a prior for the indicator variables Z
    # makes the likelihood invariant under the many possible permutations of
    # the indices. This invariance is inherited in posterior inference.
    # This invariance model implies unidentifiability and induces label
    # switching during inference.

    # Resolve by ordering the components to have increasing weights.
    # This does not deal with the parameter identifiability issue.
    order_pi_potential = pm.Potential(
        'order_pi_potential',
        T.sum([T.switch(pi[k] - pi[k-1] < 0, -np.inf, 0)
               for k in range(1, K)]))

    # Indicators, specifying which cluster each primary entity belongs to.
    # These are draws from Multinomial with 1 trial.
    init_pi = stats.dirichlet.rvs(alpha.eval())[0]
    test_Z = np.random.multinomial(n=1, pvals=init_pi, size=n_pe)
    as_cat = np.nonzero(test_Z)[1]
    Z = pm.Categorical(
        'Z', p=pi, shape=n_pe, testval=as_cat)

    # ATTEMPT 2: Give infinite negative likelihood to the case
    # where any of the clusters have no users assigned.
    # ================================================================
    # sizes = [T.eq(Z, k).nonzero()[0].shape[0] for k in range(K)]
    # nonempty_potential = pm.Potential(
    #     'comp_nonempty_potential',
    #     np.sum([T.switch(sizes[k] < 1, -np.inf, 0) for k in range(K)]))
    # ================================================================

    # ATTEMPT 3: Add same sample to each cluster, each has at least 1.
    # ================================================================
    # shared_X = X.mean(0)[None, :]
    # shared_y = y.mean().reshape(1)
    # X = T.concatenate((shared_X.repeat(K).reshape(K, F), X))
    # y = T.concatenate((shared_y.repeat(K), y))

    # Add range(K) on to the beginning to include shared instance.
    # Z_expanded = Z[pids]
    # Z_with_shared = T.concatenate((range(K), Z_expanded))
    # pid_idx = pm.Deterministic('pid_idx', Z_with_shared)
    # ================================================================

    # Expand user cluster indicators to each observation for each user.
    pid_idx = pm.Deterministic('pid_idx', Z[pids])

    # Construct masks for each component.
    masks = [T.eq(pid_idx, k).nonzero() for k in range(K)]
    comp_sizes = [masks[k][0].shape[0] for k in range(K)]

    # Component regression precision parameters.
    beta = pm.Gamma(
        'beta', alpha=a0, beta=b0, shape=(K,),
        testval=np.random.gamma(a0, b0, size=K))
    # Regression coefficient matrix, with coeffs for each component.
    W = pm.MvNormal(
        'W', mu=mu0, tau=T.diag(coeff_precisions), shape=(K, F),
        testval=np.random.randn(K, F) * std)

    # The mean of the observations is the result of a regression, with
    # coefficients determined by the cluster the sample belongs to.

    # Now we have K different multivariate normal distributions.
    X = T.cast(X, 'float64')
    y = T.cast(y, 'float64')
    comps = []
    for k in range(K):
        mask_k = masks[k]

        X_k = X[mask_k]
        y_k = y[mask_k]

        n_k = comp_sizes[k]
        precision_matrix = beta[k] * T.eye(n_k)

        comp_k = pm.MvNormal(
            'comp_%d' % k,
            mu=T.dot(X_k, W[k]), tau=precision_matrix,
            observed=y_k)
        comps.append(comp_k)

The first two approaches fail to ensure non-empty clusters; attempting to sample results in a LinAlgError:

with gmreg:
step1 = pm.Metropolis(vars=[pi, beta, W])
step2 = pm.ElemwiseCategoricalStep(vars=[Z], values=np.arange(K))
tr = pm.sample(100, step=[step1, step2])
...:     

Failed to compute determinant []
---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
<ipython-input-2-c7df53f4c6a5> in <module>()
      2     step1 = pm.Metropolis(vars=[pi, beta, W])
      3     step2 = pm.ElemwiseCategoricalStep(vars=[Z], values=np.arange(K))
----> 4     tr = pm.sample(100, step=[step1, step2])
      5 

/home/mack/anaconda/lib/python2.7/site-packages/pymc3/sampling.pyc in sample(draws, step, start, trace, chain, njobs, tune, progressbar, model, random_seed)
    155         sample_args = [draws, step, start, trace, chain,
    156                        tune, progressbar, model, random_seed]
--> 157     return sample_func(*sample_args)
    158 
    159 

/home/mack/anaconda/lib/python2.7/site-packages/pymc3/sampling.pyc in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed)
    164     progress = progress_bar(draws)
    165     try:
--> 166         for i, strace in enumerate(sampling):
    167             if progressbar:
    168                 progress.update(i)

/home/mack/anaconda/lib/python2.7/site-packages/pymc3/sampling.pyc in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    246         if i == tune:
    247             step = stop_tuning(step)
--> 248         point = step.step(point)
    249         strace.record(point)
    250         yield strace

/home/mack/anaconda/lib/python2.7/site-packages/pymc3/step_methods/compound.pyc in step(self, point)
     12     def step(self, point):
     13         for method in self.methods:
---> 14             point = method.step(point)
     15         return point

/home/mack/anaconda/lib/python2.7/site-packages/pymc3/step_methods/arraystep.pyc in step(self, point)
     87             inputs += [point]
     88 
---> 89         apoint = self.astep(bij.map(point), *inputs)
     90         return bij.rmap(apoint)
     91 

/home/mack/anaconda/lib/python2.7/site-packages/pymc3/step_methods/gibbs.pyc in astep(self, q, logp)
     38 
     39     def astep(self, q, logp):
---> 40         p = array([logp(v * self.sh) for v in self.values])
     41         return categorical(p, self.var.dshape)
     42 

/home/mack/anaconda/lib/python2.7/site-packages/pymc3/blocking.pyc in __call__(self, x)
    117 
    118     def __call__(self, x):
--> 119         return self.fa(self.fb(x))

/home/mack/anaconda/lib/python2.7/site-packages/pymc3/model.pyc in __call__(self, *args, **kwargs)
    423     def __call__(self, *args, **kwargs):
    424         point = Point(model=self.model, *args, **kwargs)
--> 425         return self.f(**point)
    426 
    427 compilef = fastfn

/home/mack/anaconda/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    604                         self.fn.nodes[self.fn.position_of_error],
    605                         self.fn.thunks[self.fn.position_of_error],
--> 606                         storage_map=self.fn.storage_map)
    607                 else:
    608                     # For the c linker We don't have access from

/home/mack/anaconda/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    593         t0_fn = time.time()
    594         try:
--> 595             outputs = self.fn()
    596         except Exception:
    597             if hasattr(self.fn, 'position_of_error'):

/home/mack/anaconda/lib/python2.7/site-packages/theano/gof/op.pyc in rval(p, i, o, n)
    766             # default arguments are stored in the closure of `rval`
    767             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 768                 r = p(n, [x[0] for x in i], o)
    769                 for o in node.outputs:
    770                     compute_map[o][0] = True

/home/mack/anaconda/lib/python2.7/site-packages/theano/tensor/nlinalg.pyc in perform(self, node, (x,), (z,))
    267     def perform(self, node, (x,), (z, )):
    268         try:
--> 269             z[0] = numpy.asarray(numpy.linalg.det(x), dtype=x.dtype)
    270         except Exception:
    271             print 'Failed to compute determinant', x

/home/mack/anaconda/lib/python2.7/site-packages/numpy/linalg/linalg.pyc in det(a)
   1769     """
   1770     a = asarray(a)
-> 1771     _assertNoEmpty2d(a)
   1772     _assertRankAtLeast2(a)
   1773     _assertNdSquareness(a)

/home/mack/anaconda/lib/python2.7/site-packages/numpy/linalg/linalg.pyc in _assertNoEmpty2d(*arrays)
    220     for a in arrays:
    221         if a.size == 0 and product(a.shape[-2:]) == 0:
--> 222             raise LinAlgError("Arrays cannot be empty")
    223 
    224 

LinAlgError: Arrays cannot be empty
Apply node that caused the error: Det(Elemwise{Mul}[(0, 1)].0)
Inputs types: [TensorType(float64, matrix)]
Inputs shapes: [(0, 0)]
Inputs strides: [(8, 8)]
Inputs values: [array([], shape=(0, 0), dtype=float64)]

Backtrace when the node is created:
  File "/home/mack/anaconda/lib/python2.7/site-packages/pymc3/distributions/multivariate.py", line 66, in logp
    result = k * T.log(2 * np.pi) + T.log(1./det(tau))

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

...which indicates the component is empty, since the precision matrix has shape (0, 0).

The third method actually resolves the empty component issue but gives very strange inference behavior. I selected a burn-in based on traceplots and thinned to every 10th sample. The samples are still highly autocorrelated but much better than without thinning. At this point, I summed the Z values across the samples, and this is what I get:

In [3]: with gmreg:
    step1 = pm.Metropolis(vars=[pi, beta, W])
    step2 = pm.ElemwiseCategoricalStep(vars=[Z], values=np.arange(K))
    tr = pm.sample(1000, step=[step1, step2])
   ...:     
 [-----------------100%-----------------] 1000 of 1000 complete in 258.8 sec

...

In [24]: zvals = tr[300::10]['Z']

In [25]: np.array([np.bincount(zvals[:, n]) for n in range(nusers)])
Out[25]: 
array([[ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70],
       [ 0,  0, 70]])

So for some reason, all of the users are being assigned to the last cluster for every sample.

Any luck on this? I'm just trying to figure out how to use infinite clusters ... — O.rka, Jan 03 '17 at 22:25
Besides numerical imprecision, shouldn't all of the entries in `pm.Dirichlet('pi', a=alpha, shape=K)` be non-zero? The support for the Dirichlet is 0 < x_i < 1 for all i entries. — bsmith89, Mar 23 '17 at 13:13
Also, in my experience instead of shape `K` in `pm.Dirichlet('pi', a=alpha, shape=K)`, I think you want `pm.Dirichlet('pi', a=alpha, shape=(len(alpha), K))`. ... Or maybe it's `pm.Dirichlet('pi', a=alpha, shape=(K, len(alpha)))`; I forget. — bsmith89, Mar 23 '17 at 13:15

score 2 · Answer 1 · answered May 31 '17 at 14:01

I have run into a similar problem. Something like this worked for a mixture of multivariate gaussians model. As for whether it's the best, it's certainly the best solution I've found.

pm.Potential('pi_min_potential', T.switch(
                                           T.all(
                                                [pi[i, 0] < 0.1 for i in range(K)]), -np.inf, 0))

The key here is that you need to account for each potential that is below your cutoff. Further, you should adjust the shape of your pi distribution, as mentioned in the comments. This will affect your indexing in the T.switch call (on the pi[i,0]).

PyMC3 Multivariate Mixture Model: Constraining components to be non-empty

1 Answers1