Part of speech tagging with Viterbi algorithm

Question

I am working on a project where I need to use the Viterbi algorithm to do part of speech tagging on a list of sentences. For my training data I have sentences that are already tagged by word that I assume I need to parse and store in some data structure. Then I have a test data which also contains sentences where each word is tagged.

I'm a bit confused on how I would approach this problem. I guess part of the issue stems from the fact that I don't think I fully understand the point of the Viterbi algorithm. Am I supposed to use the Viterbi algorithm to tag my test data and compare the results to the actual data? What data structures are best to do this and represent a sentence?

Any help would be greatly appreciated.

homework tag... http://stackoverflow.com/questions/9729968/python-implementation-of-viterbi-algorithm — alvas, Feb 27 '14 at 15:15

score 2 · Answer 1 · answered May 09 '17 at 21:05

Viterbi algorithm is not to tag your data. You should have manually (or semi-automatically by the state-of-the-art parser) tagged data for training.

Viterbi is used to calculate the best path to a node and to find the path to each node with the lowest negative log probability.

Python implementation of HMM (Viterbi) POS Tagger: https://github.com/zachguo/HMM-Trigram-Tagger/blob/master/HMM.py

Part of speech tagging with Viterbi algorithm

1 Answers1