0

I'm new to PyMC3, Theano, and numpy. Was just trying to duplicate the first 'hidden' Markov Model in the Stan manual--the one in which the states are actually observed. But, I keep running into errors having to do with Theano, numpy, and perhaps what is going on behind PyMC3 distributions, which seem a bit mysterious to me. My code for the model is below:

import pandas as pd
dat_hmm = pd.read_csv('hmmVals.csv')
emission=dat_hmm.emission.values
state=dat_hmm.state.values

from pymc3 import Model, Dirichlet, Categorical
import numpy as np

basic_model = Model()

with basic_model:
    #Model constants:
    #num unique hidden states, num unique emissions, num instances
    K=3; V=9; T=10 
    alpha=np.ones(K); beta=np.ones(V)
    # Priors for unknown model parameters
    theta = np.empty(K, dtype=object) #theta=transmission
    phi = np.empty(K, dtype=object) #phi=emission
    #observed emission, state:
    w=np.empty(T, dtype=object); z=np.empty(T, dtype=object);
    for k in range(K):
        theta[k]=Dirichlet('theta'+str(k), alpha)
        phi[k]=Dirichlet('phi'+str(k), beta)
    # Likelihood (sampling distribution) of observations
    for t in range(T):
        w[t]=Categorical('w'+str(t),theta[state[t]], shape=1, observed=emission[t])
    for t in range(2, T):
        z[t]=Categorical('z'+str(t),phi[state[t-1]], shape=1,  observed=state[t])

The line "w[t]=Categorical('w'+str(t),theta[state[t]], shape=1, observed=emission[t])" generates the error, but not on t=0, which fills in w0, but on t=1 which generates an index out of bound error. There is no index out of bound in the code line itself because state[1], theta[state[t]], and emission[t] all exist. The error messages are:

    Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/pymc3/distributions/distribution.py", line 25, in __new__
    return model.Var(name, dist, data)
  File "/usr/local/lib/python3.4/dist-packages/pymc3/model.py", line 306, in Var
    var = ObservedRV(name=name, data=data, distribution=dist, model=self)
  File "/usr/local/lib/python3.4/dist-packages/pymc3/model.py", line 581, in __init__
    self.logp_elemwiset = distribution.logp(data)
  File "/usr/local/lib/python3.4/dist-packages/pymc3/distributions/discrete.py", line 400, in logp
    a = tt.log(p[value])
  File "/usr/local/lib/python3.4/dist-packages/theano/tensor/var.py", line 532, in __getitem__
    lambda entry: isinstance(entry, Variable)))
  File "/usr/local/lib/python3.4/dist-packages/theano/gof/op.py", line 668, in __call__
    required = thunk()
  File "/usr/local/lib/python3.4/dist-packages/theano/gof/op.py", line 883, in rval
    fill_storage()
  File "/usr/local/lib/python3.4/dist-packages/theano/gof/cc.py", line 1707, in __call__
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python3.4/dist-packages/six.py", line 686, in reraise
    raise value
IndexError: index out of bounds

I don't know about the wisdom of sticking numpy objects into PyMC3 distributions or using the result of that to try to parameterize another distribution, but I have seen somewhat similar code on the web, minus the last part. Is there perhaps no good way to code such a hidden Markov model in PyMC3 yet?

JasonK
  • 113
  • 7

2 Answers2

2

I have found a way to fix the above error. The following code works--no errors and I'm able to get correct parameter estimates with Metropolis at least.

I made two mistakes and didn't realize they were so simple because I expected something complicated to be happening in Theano. One is that my data was set up for Stan and so indexed to start at 1 rather than 0. Python indexes everything to 0. I changed the data file by subtracting 1 from every value. The other error was I used theta, the transmission matrix, to calculate individual emissions and vice versa for the phi matrix. Theta was too short for the emissions.

What I wish I understood now was why the NUTS sampler keeps telling me I have a non-positive definite scaling, even though I'm feeding it MAP estimates. Metropolis works, but is slow-- about 11 minutes for these 300 observations and 1000 samples. The other mystery is why PyMC3 thinks it only took a couple seconds to calculate the samples.

import pandas as pd

dat_hmm = pd.read_csv('hmmVals.csv')

emission=dat_hmm.emission.values
state=dat_hmm.state.values

from pymc3 import Model, Dirichlet, Categorical
import numpy as np

basic_model = Model()

with basic_model:
    #Model constants:
    K=3; V=9; T=300 #num unique hidden states, num unique emissions, num instances
    alpha=np.ones(K); beta=np.ones(V)
    # Priors for unknown model parameters
    theta = np.empty(K, dtype=object) #theta=transmission
    phi = np.empty(K, dtype=object) #phi=emission
    w=np.empty(T, dtype=object); z=np.empty(T, dtype=object); #observed emission, state
    for k in range(K):
        theta[k]=Dirichlet('theta'+str(k), alpha)
        phi[k]=Dirichlet('phi'+str(k), beta)
    #Likelihood (sampling distribution) of observationss
    for t in range(2, T):
        z[t]=Categorical('z'+str(t),theta[state[t-1]], shape=1,  observed=state[t])
    for t in range(T):
        w[t]=Categorical('w'+str(t),phi[state[t]], shape=1, observed=emission[t])
JasonK
  • 113
  • 7
  • PS If this solution was helpful to you (it looks like a lot of people are running into problems implementing HMM models in PyMC3), please 'vote' for my answer. – JasonK Aug 07 '16 at 14:00
  • You can choose you own answer as the right answer. I have not checked your code, but did you try vectorizing it? Probably most of the time is spent on compiling the model and finding MAP and the sampling is pretty fast. – aloctavodia Aug 12 '16 at 07:17
  • Thank aloctavodia and sorry for my slow response (was on vacation). The above doesn't work with NUTS and Metropolis takes too long to allow me to work w/ much more data & complicated models. Vectorizing might help, but I apparently haven't figured out how to vectorize. Any help would be appreciated. I tried to take z out of a loop and give it all of theta, hoping that would partly vectorize the model, but that just gives an error msg. – JasonK Aug 27 '16 at 22:02
1

I have also tried to implement HMMs in pymc3 and I had run into similar problems. I just found a way to implement a two level HMM in a vectorized fashion (to be perfectly honest, my model is not hidden but the hidden part can be added easily - I vectorized the description of the state variable). I am not sure whether this is the most efficient way, but I tested this code against a simple for loop for defining the states. The code below runs in less than a minute with 1000 data points whereas the for loop took several hours.

Here is the code:

import numpy as np
import theano.tensor as tt
import pymc3 as pm

class HMMStates(pm.Discrete):
    """
    Hidden Markov Model States
    Parameters
    ----------
    P1 : tensor
        probability to remain in state 1
    P2 : tensor
        probability to move from state 2 to state 1

    """

    def __init__(self, PA=None, P1=None, P2=None,
                 *args, **kwargs):
        super(HMMStates, self).__init__(*args, **kwargs)
        self.PA = PA
        self.P1 = P1
        self.P2 = P2
        self.mean = 0.

    def logp(self, x):
        PA = self.PA
        P1 = self.P1
        P2 = self.P2

        # now we need to create an array with probabilities
        # so that for x=A: PA=P1, PB=(1-P1)
        # and for x=B: PA=P2, PB=(1-P2)
        length = x.shape[0]
        P1T = tt.tile(P1,(length-1,1)).T
        P2T = tt.tile(P2,(length-1,1)).T

        P = tt.switch(x[:-1],P1T,P2T).T

        x_i = x[1:]
        ou_like = pm.Categorical.dist(P).logp(x_i)
        return pm.Categorical.dist(PA).logp(x[0]) + tt.sum(ou_like)

This class creates the states of the HMM. To call it you can do the following:

theta = np.ones(2) # prior for probabilities
with pm.Model() as model:
    # 2 state model
    # P1 is probablility to stay in state 1
    # P2 is probability to move from state 2 to state 1
    P1 = pm.Dirichlet('P1', a=theta)
    P2 = pm.Dirichlet('P2', a=theta)

    PA = pm.Deterministic('PA',P2/(P2+1-P1))

    states = HMMStates('states',PA,P1,P2, observed=data)

    start = pm.find_MAP()

    trace = pm.sample(5000, start=start)

just to show how the old code looks like:

with pm.Model() as model:
    # 2 state model
    # P1 is probablility to stay in state 1
    # P2 is probability to move from state 2 to state 1
    P1 = pm.Dirichlet('P1', a=np.ones(2))
    P2 = pm.Dirichlet('P2', a=np.ones(2))

    PA = pm.Deterministic('PA',P2/(P2+1-P1))

    state = pm.Categorical('state0',PA, observed=data[0])
    for i in range(1,N_chain):
        state = pm.Categorical('state'+str(i), tt.switch(data[i-1],P1,P2), observed=data[i])

    start = pm.find_MAP()

    trace = pm.sample(5000, start=start)
Helmut Strey
  • 100
  • 8