7

I am starting to learn hidden markov models and on the wiki page, as well as on github there are alot of examples but most of the probabilities are already there(70% change of rain, 30% chance of changing state, etc..). The spell checking or sentences examples, seem to study books and then rank the probabilities of words.

So does the markov model include a way of figuring out the probabilities or are we suppose to some other other model to pre-calculate it?

Sorry if this question is off. I think its straightforward how the hidden markov model selects probable sequences but the probability part is a bit grey to me(because its often provided). Examples or any info would be great.


For those not familiar with markov models, here's an example(from wikipedia) http://en.wikipedia.org/wiki/Viterbi_algorithm and http://en.wikipedia.org/wiki/Hidden_Markov_model

#!/usr/bin/env python

states = ('Rainy', 'Sunny')

observations = ('walk', 'shop', 'clean')

start_probability = {'Rainy': 0.6, 'Sunny': 0.4}

transition_probability = {
   'Rainy' : {'Rainy': 0.7, 'Sunny': 0.3},
   'Sunny' : {'Rainy': 0.4, 'Sunny': 0.6},
   }

emission_probability = {
   'Rainy' : {'walk': 0.1, 'shop': 0.4, 'clean': 0.5},
   'Sunny' : {'walk': 0.6, 'shop': 0.3, 'clean': 0.1},
   }

#application code
# Helps visualize the steps of Viterbi.
def print_dptable(V):
    print "    ",
    for i in range(len(V)): print "%7s" % ("%d" % i),
    print

    for y in V[0].keys():
        print "%.5s: " % y,
        for t in range(len(V)):
            print "%.7s" % ("%f" % V[t][y]),
        print

def viterbi(obs, states, start_p, trans_p, emit_p):
    V = [{}]
    path = {}

    # Initialize base cases (t == 0)
    for y in states:
        V[0][y] = start_p[y] * emit_p[y][obs[0]]
        path[y] = [y]

    # Run Viterbi for t > 0
    for t in range(1,len(obs)):
        V.append({})
        newpath = {}

        for y in states:
            (prob, state) = max([(V[t-1][y0] * trans_p[y0][y] * emit_p[y][obs[t]], y0) for y0 in states])
            V[t][y] = prob
            newpath[y] = path[state] + [y]

        # Don't need to remember the old paths
        path = newpath

    print_dptable(V)
    (prob, state) = max([(V[len(obs) - 1][y], y) for y in states])
    return (prob, path[state])



#start trigger
def example():
    return viterbi(observations,
                   states,
                   start_probability,
                   transition_probability,
                   emission_probability)
print example()
Lostsoul
  • 25,013
  • 48
  • 144
  • 239

1 Answers1

5

You're looking for an EM (expectation maximization) algorithm to compute the unknown parameters from sets of observed sequences. Probably the most commonly used is the Baum-Welch algorithm, which uses the forward-backward algorithm.

For reference, here is a set of slides I've used previously to review HMMs. It has a nice overview of Forward-Backward, Viterbi, and Baum-Welch

Dusty
  • 2,283
  • 2
  • 20
  • 24
  • Thank you very much. The links I have read before the slides were really good. They did clear alot of questions I had but I'm still unsure how the probability is figured out. For example on slide, 41 they have probabilities to each node(1/3,1/2,etc..). I'm trying to figure out how to get those, and keep updating them. It maybe in the slides and I'm missing it, I new to this so I'm going to study it more carefully over the weekend. Thanks for the slides and answer. – Lostsoul Oct 28 '11 at 19:28
  • @Lostsoul - Right, slide 41 and that region is just explaining how HMMs work in general. Around slide 68, it starts talking about how you go about estimating the parameters (collectively referred to as λ) from a set of observations. And the algorithm that does that is Baum-Welch. – Dusty Oct 28 '11 at 23:19
  • Thanks again I can't thank you enough. My Math sucks so it took me several readings of the slide(and alot of googling) to understand what's going on. I don't fully understand the math but I get the logic now. Thanks so much again, Dusty. – Lostsoul Oct 31 '11 at 18:58
  • @Dusty: Learning/Training is done by Baum-Welch algorithm which internally uses Forward-Backward algorithm (which computes Final State when given Initial parameters of HMM). So, given parameters of HMM, Baum-Welch algorithm recursively tries to optimize the parameters itself. Did I get it right? – garak Dec 07 '11 at 17:37