Using NLTK
Unigram Tagger, I am training sentences in Brown Corpus
I try different categories
and I get about the same value. The value is around 0.9328
... for each categories
such as fiction
, romance
or humor
from nltk.corpus import brown
# Fiction
brown_tagged_sents = brown.tagged_sents(categories='fiction')
brown_sents = brown.sents(categories='fiction')
unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
unigram_tagger.evaluate(brown_tagged_sents)
>>> 0.9415956079897209
# Romance
brown_tagged_sents = brown.tagged_sents(categories='romance')
brown_sents = brown.sents(categories='romance')
unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
unigram_tagger.evaluate(brown_tagged_sents)
>>> 0.9348490474422324
Why is it that the case? is it because they are from the same corpus
? or are their part-of-speech
tagging is the same?