i'm running a simple HMM using scikit-learn's hmmlearn
module. it works for fully observed data, but it fails when i pass it observations with missing data. small example:
import numpy as np
import hmmlearn
import hmmlearn.hmm as hmm
transmat = np.array([[0.9, 0.1],
[0.1, 0.9]])
emitmat = np.array([[0.5, 0.5],
[0.9, 0.1]])
# this does not work: cannot have missing data
obs = np.array([0, 1] * 5 + [np.nan] * 5)
# this works
#obs = np.array([0, 1] * 5 + [1] * 5)
startprob = np.array([0.5, 0.5])
h = hmm.MultinomialHMM(n_components=2,
startprob=startprob,
transmat=transmat)
h.emissionprob_ = emitmat
print obs, type(obs)
posteriors = h.predict_proba(obs)
print posteriors
if obs
is fully observed (every element is 0 or 1) it works but i would like to get estimates for unobserved data points. i tried encoding these as np.nan
or None
but neither works. it gives the error IndexError: arrays used as indices must be of integer (or boolean) type
(in hmm.py", line 430, in _compute_log_likelihood
).
how can this be done in hmmlearn?