1

I am working on a project where I need to use the Viterbi algorithm to do part of speech tagging on a list of sentences. For my training data I have sentences that are already tagged by word that I assume I need to parse and store in some data structure. Then I have a test data which also contains sentences where each word is tagged.

I'm a bit confused on how I would approach this problem. I guess part of the issue stems from the fact that I don't think I fully understand the point of the Viterbi algorithm. Am I supposed to use the Viterbi algorithm to tag my test data and compare the results to the actual data? What data structures are best to do this and represent a sentence?

Any help would be greatly appreciated.

user2604504
  • 697
  • 2
  • 14
  • 29
  • homework tag... http://stackoverflow.com/questions/9729968/python-implementation-of-viterbi-algorithm – alvas Feb 27 '14 at 15:15

1 Answers1

2

Viterbi algorithm is not to tag your data. You should have manually (or semi-automatically by the state-of-the-art parser) tagged data for training.

Viterbi is used to calculate the best path to a node and to find the path to each node with the lowest negative log probability.

Python implementation of HMM (Viterbi) POS Tagger: https://github.com/zachguo/HMM-Trigram-Tagger/blob/master/HMM.py

aerin
  • 20,607
  • 28
  • 102
  • 140