0

I am upskilling on Hidden Markov models, and I came across a Python package called hmmlearn. I been playing with the multinomial example here -> https://hmmlearn.readthedocs.io/en/latest/auto_examples/plot_multinomial_hmm.html

I had a question about the expected output for the first item in the sequence of predictions:

['dog', 'dog', 'dog', 'dog', 'dog', 'cat', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'dog', 'dog', 'dog', 'cat', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'dog', 'dog', 'dog', 'cat', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'dog', 'dog', 'dog', 'cat', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'dog', 'dog', 'dog', 'cat', 'cat', 'dog', 'dog', 'cat']

If we use a starting probability of

# For this example, we will model the stages of a conversation,
# where each sentence is "generated" with an underlying topic, "cat" or "dog"
states = ["cat", "dog"]
id2topic = dict(zip(range(len(states)), states))
# we are more likely to talk about cats first
start_probs = np.array([0.9, 0.1])

I would expect nine times out of ten that cat would be predicted as the first state

Using the existing emission and transition matrices in the example:

emission_probs = np.array([[0.25, 0.1, 0.4, 0.25],
                           [0.2, 0.5, 0.1, 0.2]])

# Also assume it's more likely to stay in a state than transition to the other
trans_mat = np.array([[0.8, 0.2], [0.2, 0.8]])

I output the list of decoded probabilities using the following line:

print(new_model.predict_proba(sequences))

When I run the code ten times, my first probabilities are as follows:

[7.40189743e-135 1.00000000e+000]  (dog)
[1.00000000e+00 1.70108469e-25]    (cat)
[9.05027537e-93 1.00000000e+00]    (dog)
[1.00000000e+000 2.17614843e-109]  (cat)
[1.08324620e-117 1.00000000e+000]  (dog)
[1.59606071e-22 1.00000000e+00]    (dog)
[1.00000000e+00 0.00000000e+00]    (cat)
[1.57341950e-37 1.00000000e+00]    (dog)
[1.00000000e+00 4.42527268e-67]    (cat)
[8.82720501e-107 1.00000000e+000]  (dog)
[1.94074438e-120 1.00000000e+000]  (dog)

I would have expected that nine times out of ten, the starting state would be cat. I am wondering if I need to run the code, say 100 times or 1000 times to see the expected starting state to be cats (10 times out of 100, for example)

I was going to ask the question on the hmmlearn github repository, however as the library is no longer under active development, the developers have redirected me here.

All answers are much appreciated. Jonathan

0 Answers0