I am working on a project where I need to use the Viterbi algorithm to do part of speech tagging on a list of sentences. For my training data I have sentences that are already tagged by word that I assume I need to parse and store in some data structure. Then I have a test data which also contains sentences where each word is tagged.
I'm a bit confused on how I would approach this problem. I guess part of the issue stems from the fact that I don't think I fully understand the point of the Viterbi algorithm. Am I supposed to use the Viterbi algorithm to tag my test data and compare the results to the actual data? What data structures are best to do this and represent a sentence?
Any help would be greatly appreciated.