Proper Noun detection in Acronyms with the POS Tagger

Question

I'm working on a natural language search engine for Strasbourg's CDS. (Astronomical Data Center of Strasbourg)

I was wondering how the Stanford Part-Of-Speech tagger was tagging acronyms, as acronym are sometimes tagged as NNP, and sometimes are tagged just as a NN.

I wasn't able to find exactly how the programm is deciding on whether or not an acronym like "CDS" or "NASA" is a NNP or a NN.

If someone could help me on the subject, I'll be really glad. :)

Have a good day.

score 0 · Answer 1 · answered May 11 '17 at 21:30

The POS tagger is a statistical model that is trained on thousands of sentences from the Wall Street Journal. It can be influenced by factors such as what character sequences appear in the word and what words surround the word in the sentence.

There are more details available here: https://nlp.stanford.edu/software/tagger.shtml

Proper Noun detection in Acronyms with the POS Tagger

1 Answers1