I am trying to tag a HTML page full of space-separated numbers like "5320412185 5320412184 5320412189..." to observe how the tagger behaves with numbers. I'm using english-left3words-distsim.tagger
in the constructor. I'm observing on the console that most of the numbers are tagged as CD
but at times there are also numbers getting tagged as NN
. I searched on the FAQ page of nlp.stanford.edu but I couldn't find this there. Can anyone help me in understanding this?
I don't know if I should need to mention this: I'm feeding each number separately to the tagger by splitting the huge input(1045000 numbers!) based on space-delimiter.