I am trying to get my hands dirty on nltk parts of speech tagging. I am using brill tagger, which creates series of rules. My templates are as follows :-
templates = [
Template(Pos(1,1)),
Template(Pos(2,2)),
Template(Pos(1,2)),
Template(Pos(1,3)),
Template(Word(1,1)),
Template(Word(2,2)),
Template(Word(1,2)),
Template(Word(1,3)),
Template(Pos(-1, -1), Pos(1,1)),
Template(Word(-1, -1), Word(1,1))
]
My rule table looks as follow :-
Found 149 useful rules.
B |
S F r O | Score = Fixed - Broken
c i o t | R Fixed = num tags changed incorrect -> correct
o x k h | u Broken = num tags changed correct -> incorrect
r e e e | l Other = num tags changed incorrect -> incorrect
e d n r | e
------------------+-------------------------------------------------------
24 24 0 1 | VB->VBP if Pos:NN@[1]
14 14 0 2 | JJ->NN if Pos:CD@[1]
14 14 0 0 | NN->VBP if Pos:NNS@[1,2,3]
11 11 0 0 | TO->IN if Pos:NN@[1,2]
9 9 0 0 | JJ->VBP if Pos:NN@[1]
1 1 0 0 | TO->IN if Pos:VB@[1]
1 1 0 0 | VBP->NN if Word:my_group@[1]
I am having issue in understanding, what rule says.
eg. NN->VBP if Pos:NNS@[1,2,3]
My Questions are :-
- Does it mean that convert NN to Verb if Part of speech tag at either 1, 2 or 3rd location is Noun in the given sentence.
- Are 1, 2 and 3 relative to the current token in the given sentence or they imply tokens located at absolute indices 1, 2 or 3 ?
- How templates are related to rules ? I mean does template(Pos(1, 2, 3)) responsible for generation of rule:-
NN->VBP if Pos:NNS@[1,2,3]
Thanks in advance.