3

I am trying to get my hands dirty on nltk parts of speech tagging. I am using brill tagger, which creates series of rules. My templates are as follows :-

templates = [
    Template(Pos(1,1)),
    Template(Pos(2,2)),
    Template(Pos(1,2)),
    Template(Pos(1,3)),
    Template(Word(1,1)),
    Template(Word(2,2)),
    Template(Word(1,2)),
    Template(Word(1,3)),
    Template(Pos(-1, -1), Pos(1,1)),
    Template(Word(-1, -1), Word(1,1))
]

My rule table looks as follow :-

    Found 149 useful rules.

           B      |
   S   F   r   O  |        Score = Fixed - Broken
   c   i   o   t  |  R     Fixed = num tags changed incorrect -> correct
   o   x   k   h  |  u     Broken = num tags changed correct -> incorrect
   r   e   e   e  |  l     Other = num tags changed incorrect -> incorrect
   e   d   n   r  |  e
------------------+-------------------------------------------------------
  24  24   0   1  | VB->VBP if Pos:NN@[1]
  14  14   0   2  | JJ->NN if Pos:CD@[1]
  14  14   0   0  | NN->VBP if Pos:NNS@[1,2,3]
  11  11   0   0  | TO->IN if Pos:NN@[1,2]
   9   9   0   0  | JJ->VBP if Pos:NN@[1]
   1   1   0   0  | TO->IN if Pos:VB@[1]
   1   1   0   0  | VBP->NN if Word:my_group@[1]

I am having issue in understanding, what rule says. eg. NN->VBP if Pos:NNS@[1,2,3]

My Questions are :-

  • Does it mean that convert NN to Verb if Part of speech tag at either 1, 2 or 3rd location is Noun in the given sentence.
  • Are 1, 2 and 3 relative to the current token in the given sentence or they imply tokens located at absolute indices 1, 2 or 3 ?
  • How templates are related to rules ? I mean does template(Pos(1, 2, 3)) responsible for generation of rule:- NN->VBP if Pos:NNS@[1,2,3]

Thanks in advance.

Mangu Singh Rajpurohit
  • 10,806
  • 4
  • 68
  • 97

0 Answers0