The Stanford POS Tagger docs (http://nlp.stanford.edu/software/pos-tagger-faq.shtml#h) claim the tagger can do 15,000 words a second. However, I'm getting about 7 words a second. I'm using the english-left3words-distsim.tagger as the docs recommended. Am I doing something wrong? Is this the result of running it with the nltk library?
from nltk.tag import StanfordPOSTagger
jar = '/Users/marie/Desktop/StandfordParser/stanford-postagger-2015-12-09/stanford-postagger.jar'
model = '/Users/marie/Desktop/StandfordParser/stanford-postagger-2015-12-09/models/english-left3words-distsim.tagger'
tagger = StanfordPOSTagger(model, jar)
tokens = word_tokenize("What's the airspeed of an unladen swallow ?")
%timeit tagger.tag(tokens)
1 loop, best of 3: 1.01 s per loop