Understanding the Viterbi algorithm

Question

I was looking for a precise step by step example of the Viterbi algorithm.

Considering sentence tagging with the input sentence as:

The cat saw the angry dog jump

And from this I would like to generate the most probable output as:

D N V T A N V

How do we use the Viterbi algorithm to get the above output using a trigram-HMM?

(PS: I'm looking for a precise step by step explanation, not a piece of code, or math representation. Assume all probabilities as numbers.)

Thanks a ton!

The Viterbi algorithm takes a series of outputs and returns the most probable series of hidden states to produce those outputs. So, what are you trying to do if you already know the hidden states (that's D N V T etc, right) ? — chm, Oct 28 '12 at 20:25

score 2 · Answer 1 · answered May 09 '17 at 21:19

For Viterbi algorithm and Hidden Markov Model, you first need the transition probability and emission probability.

In your example, the transition probability is P(D->N), P(N->V) and the emission probability (assuming bigram model) is P(D|the), P(N|cat).

Of course, in real world example, there are a lot more word than the, cat, saw, etc. You have to loop through all your training data to have estimate of P(D|the), P(N|cat), P(N|car). Then we use Viterbi algorithm to find the most likely sequence of tags such as

D N V T A N V

given your observation.

Here is my implementation of Viterbi.

def viterbi(vocab, vocab_tag, words, tags, t_bigram_count, t_unigram_count, e_bigram_count, e_unigram_count, ADD_K):
    vocab_size = len(vocab)
    V = [{}]

    for t in vocab_tag:
        # Prob of very first word
        prob = np.log2(float(e_bigram_count.get((words[0],t),0)+ADD_K))-np.log2(float(e_unigram_count[t]+vocab_size*ADD_K))
        # trigram V[0][0]
        V[0][t] = {"prob": prob, "prev": None}

    for i in range(1,len(words)):
        V.append({})
        for t in vocab_tag:
            V[i][t] =  {"prob": np.log2(0), "prev": None}
        for t in vocab_tag:
            max_trans_prob = np.log2(0);
            for prev_tag in vocab_tag:
                trans_prob = np.log2(float(t_bigram_count.get((t, prev_tag),0)+ADD_K))-np.log2(float(t_unigram_count[prev_tag]+vocab_size*ADD_K))   
                if V[i-1][prev_tag]["prob"]+trans_prob > max_trans_prob:
                    max_trans_prob = V[i-1][prev_tag]["prob"]+trans_prob 
                    max_prob = max_trans_prob+np.log2(e_bigram_count.get((words[i],t),0)+ADD_K)-np.log2(float(e_unigram_count[t]+vocab_size*ADD_K))
                    V[i][t] = {"prob": max_prob, "prev": prev_tag}
    opt = []
    previous = None 
    max_prob = max(value["prob"] for value in V[-1].values())
    # Get most probable state and its backtrack
    for st, data in V[-1].items():
        if data["prob"] == max_prob:
            opt.append(st)
            previous = st
            break
    for t in range(len(V) - 2, -1, -1):
        opt.insert(0, V[t + 1][previous]["prev"])
        previous = V[t][previous]["prev"]
    return opt

score 1 · Answer 2 · answered Sep 28 '12 at 17:00

1

I suggest that you look it up in one of the books available, e.g. Chris Bishop "Pattern Recognition and Machine Learning". Viterbi algorithm is a really basic thing and has been described in various levels of detail in the literature.

answered Sep 28 '12 at 17:00

Qnan

3,714
18
15

Understanding the Viterbi algorithm

2 Answers2