2

I believe I understand HMM at its core. Through HMM we solve evaluation (prob of emitted seq), decoding (most probable hidden seq), and learning problem (learning transition and emission prob-matrix from observed set of emission seq).

My problem is associated with the learning problem. I have emission sequence but with that I also have associated features (meaning hidden state value, but the number of hidden states is not known) for each sequence. As in learning problem of HMM, we estimate hidden-sequence (size and prob-matrix) and for that we just need emission sequence (size of hidden sequence can be optimized if not known in advance).

I am using HMM library for my computation. Of course, it does not have the option I want.

import numpy as np
import pandas as pd

from hmmlearn import hmm

filenames =  [f for f in os.listdir(dir_path) if '.csv' in f.lower()]
d1 = pd.read_csv(dir_path + filenames[0]).as_matrix() # Shape = [m, 3] => first two column is featute and last is the emission-state 
d2 = pd.read_csv(dir_path + filenames[1]).as_matrix() # Shape = [m, 3]


##
remodel = hmm.GaussianHMM(n_components=4, covariance_type="full", n_iter=100)

remodel.fit(d1[:, 0:2])  # Problem would have been solved if there was supervised option to pass the states as well 

pred_1 = remodel.predict(d1[:, 0:2])
true_1 = d1[:, -1] # Last column is state of the feature in 1, 2 column.

pred_2 = remodel.predict(d2[:, 0:2])
true_2 = d2[:, -1]

Is there a way to do supervised learning in HMM, if yes then how? If not then can I still solve my problem using HMM? If it is possible then how?

zeal
  • 465
  • 2
  • 11
  • 22

1 Answers1

6

hmmlearn does not implement supervised learning (hmmlearn#109).

The seqlearn library implements supervised HMMs, but does not seem to implement GMMs.

The library pomegranate however seems to implement supervised Hidden Markov Models with Gaussian Mixture Models. Something like this:

import pomegranate as pg

X = ...
y = ...
distribution = pg.MultivariateGaussianDistribution
model = pg.HiddenMarkovModel.from_samples(distribution, n_components=5, X=X, labels=y, algorithm='labeled')
Jon Nordby
  • 5,494
  • 1
  • 21
  • 50