Can someone please explain the behaviour of NLTK's BigramTagger in these examples?
I instantiated the tagger by
bi= BigramTagger(brown.tagged_sents(categories='news')[:500])
Now, I want to use this on one specific sentence.
>>> bi.tag(brown_sents[2])
[(u'The', u'AT'), (u'September-October', u'NP'), (u'term', u'NN'), (u'jury', u'NN'), (u'had', u'HVD'), (u'been', u'BEN'), (u'charged', u'VBN'), (u'by', u'IN'), (u'Fulton', u'NP-TL'), (u'Superior', u'JJ-TL'), (u'Court', u'NN-TL'), (u'Judge', u'NN-TL'), (u'Durwood', u'NP'), (u'Pye', u'NP'), (u'to', u'TO'), (u'investigate', u'VB'), (u'reports', u'NNS'), (u'of', u'IN'), (u'possible', u'JJ'), (u'``', u'``'), (u'irregularities', u'NNS'), (u"''", u"''"), (u'in', u'IN'), (u'the', u'AT'), (u'hard-fought', u'JJ'), (u'primary', u'NN'), (u'which', u'WDT'), (u'was', u'BEDZ'), (u'won', u'VBN'), (u'by', u'IN'), (u'Mayor-nominate', u'NN-TL'), (u'Ivan', u'NP'), (u'Allen', u'NP'), (u'Jr.', u'NP'), (u'.', u'.')]
Works well, but hey, it's all known data. Let me change one word and see if it sets something off.
>>> sent=brown_sents[2]
>>> sent[5]
u'been'
>>> sent[5] = u'was'
>>> bi.tag(sent)
[(u'The', u'AT'), (u'September-October', u'NP'), (u'term', u'NN'), (u'jury', u'NN'), (u'had', u'HVD'), (u'was', None), (u'charged', None), (u'by', None), (u'Fulton', None), (u'Superior', None), (u'Court', None), (u'Judge', None), (u'Durwood', None), (u'Pye', None), (u'to', None), (u'investigate', None), (u'reports', None), (u'of', None), (u'possible', None), (u'``', None), (u'irregularities', None), (u"''", None), (u'in', None), (u'the', None), (u'hard-fought', None), (u'primary', None), (u'which', None), (u'was', None), (u'won', None), (u'by', None), (u'Mayor-nominate', None), (u'Ivan', None), (u'Allen', None), (u'Jr.', None), (u'.', None)]
Now I expected to see changed tuple, (u'been', u'BEN')
to now be (u'been', None). Why is everything after it in the sentence now not tagged? Those words were tagged in connection to another ones, not 'been'.
Any recommendation on use of tagged sentences would be appreciated as well.