I'm new to PyMC3, Theano, and numpy. Was just trying to duplicate the first 'hidden' Markov Model in the Stan manual--the one in which the states are actually observed. But, I keep running into errors having to do with Theano, numpy, and perhaps what is going on behind PyMC3 distributions, which seem a bit mysterious to me. My code for the model is below:
import pandas as pd
dat_hmm = pd.read_csv('hmmVals.csv')
emission=dat_hmm.emission.values
state=dat_hmm.state.values
from pymc3 import Model, Dirichlet, Categorical
import numpy as np
basic_model = Model()
with basic_model:
#Model constants:
#num unique hidden states, num unique emissions, num instances
K=3; V=9; T=10
alpha=np.ones(K); beta=np.ones(V)
# Priors for unknown model parameters
theta = np.empty(K, dtype=object) #theta=transmission
phi = np.empty(K, dtype=object) #phi=emission
#observed emission, state:
w=np.empty(T, dtype=object); z=np.empty(T, dtype=object);
for k in range(K):
theta[k]=Dirichlet('theta'+str(k), alpha)
phi[k]=Dirichlet('phi'+str(k), beta)
# Likelihood (sampling distribution) of observations
for t in range(T):
w[t]=Categorical('w'+str(t),theta[state[t]], shape=1, observed=emission[t])
for t in range(2, T):
z[t]=Categorical('z'+str(t),phi[state[t-1]], shape=1, observed=state[t])
The line "w[t]=Categorical('w'+str(t),theta[state[t]], shape=1, observed=emission[t])" generates the error, but not on t=0, which fills in w0, but on t=1 which generates an index out of bound error. There is no index out of bound in the code line itself because state[1], theta[state[t]], and emission[t] all exist. The error messages are:
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/pymc3/distributions/distribution.py", line 25, in __new__
return model.Var(name, dist, data)
File "/usr/local/lib/python3.4/dist-packages/pymc3/model.py", line 306, in Var
var = ObservedRV(name=name, data=data, distribution=dist, model=self)
File "/usr/local/lib/python3.4/dist-packages/pymc3/model.py", line 581, in __init__
self.logp_elemwiset = distribution.logp(data)
File "/usr/local/lib/python3.4/dist-packages/pymc3/distributions/discrete.py", line 400, in logp
a = tt.log(p[value])
File "/usr/local/lib/python3.4/dist-packages/theano/tensor/var.py", line 532, in __getitem__
lambda entry: isinstance(entry, Variable)))
File "/usr/local/lib/python3.4/dist-packages/theano/gof/op.py", line 668, in __call__
required = thunk()
File "/usr/local/lib/python3.4/dist-packages/theano/gof/op.py", line 883, in rval
fill_storage()
File "/usr/local/lib/python3.4/dist-packages/theano/gof/cc.py", line 1707, in __call__
reraise(exc_type, exc_value, exc_trace)
File "/usr/local/lib/python3.4/dist-packages/six.py", line 686, in reraise
raise value
IndexError: index out of bounds
I don't know about the wisdom of sticking numpy objects into PyMC3 distributions or using the result of that to try to parameterize another distribution, but I have seen somewhat similar code on the web, minus the last part. Is there perhaps no good way to code such a hidden Markov model in PyMC3 yet?