Why does Stanford CoreNLP NER-annotator load 3 models by default?

Question

When I add the "ner" annotator to my StanfordCoreNLP object pipeline, I can see that it loads 3 models, which takes a lot of time:

Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [10.3 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [10.1 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [6.5 sec].
Initializing JollyDayHoliday for SUTime from classpath: edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt

Is there a way to just load a subset that will work equally? Particularly, I am unsure why it is loading the 3-class and 4-class NER models when it has the 7-class model, and I'm wondering if not loading these two will still work.

StanfordNLPHelp · Accepted Answer · 2015-11-25T10:24:09.213

You can set which models are loaded in this manner:

command line:

-ner.model model_path1,model_path2

Java code:

 props.put("ner.model", "model_path1,model_path2");

Where model_path1 and model_path2 should be something like:

"edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz"

The models are applied in layers. The first model is run and its tags applied. Then the second, the third, and so on. If you want less models, you can put 1 or 2 models in the list instead of the three default, but this will change how the system performs.

If you set "ner.combinationMode" to "HIGH_RECALL", all models will be allowed to apply all of their tags. If you set "ner.combinationMode" to "NORMAL", then a future model cannot apply any tags set by previous models.

All three models in the default were trained on different data. For instance, the 3-class was trained with substantially more data than the 7-class model. So each model is doing something different and their results are all being combined to create the final tag sequence.

Any idea what happens when there is a conflict between model tags predictions ? which one gets priority ? — Hima, Jun 17 '19 at 11:06
If the list is model1,model2,model3, model1 has top priority, then model2, then model3...the earlier models in the list take precedence...model2 cannot overwrite any tokens that model1 has written on — StanfordNLPHelp, Jun 17 '19 at 21:41

Why does Stanford CoreNLP NER-annotator load 3 models by default?

1 Answers1

Linked