I've been working through a problem from my machine learning class that I can not seem to figure out.
The gist of the algorithm if I'm understanding it correctly is:
Expectation:
• For each sentence s in S:
○ For each word/tag pair (w,t):
§ For every occurence of w (at position i) in s:
□ EmissionCounts(w,t) += (forward[t][i]*backward[t][i])/(sum of forward[tag][N] for all tags)
○ For every tag/tag pair:
§ For every adjacent pair of words (starting at position i):
□ TransitionCounts(t1,t2) += forward[t1][i]*P(t2|t1)*P(w[i+1]|t2)*backward[t2][i+1] / (sum of forward[tag][N] for all tags)
○ For every tag:
§ For the first word in the sentence:
□ InitialCounts(t) = pi(t)*P(w[1]|t)*backward[t][1] / (sum forward[t][N] for all tags)
• For each tag t:
○ For every word w:
§ TagCounts(t) += EmissionCounts(w,t)
Maximization:
• PI(t) = InitalCounts(t)/(# sentences)
• P(t2|t1) = TransitionCounts(t1,t2)/TagCounts(t1)
• P(w|t) = EmissionCounts(w,t)/TagCounts(t)
Check for convergence:
Here's a link to my baum welch algorithm. Anyone have any ideas as to what I may be doing wrong?
https://gist.github.com/dmcquillan314/4058b9048799e3488a05
Here's a link to the entire repo it's from as well: https://github.com/dmcquillan314/HW6