I have to implement a Discriminatively trained supervised part of speech tagger, and I have been looking at a couple of techniques including Maximum likelihood, perceptron and the large margin (SVM). Finally after reading through some experimental results quoted in a couple of research papers i have come down to using SVMs for it. I have been studying it for some time and a couple of things in theory seem a little confusing. Can someone please point me to some relevant reading material to a practical implementation or just more clarification on how to implement it using Viterbi Algorithm.
P.S. : I am not asking for the solution, but just need some guidance.