0

Version

  • Windows 7
  • Python 2.7.10
  • NLTK 3.1
  • Stanford NER 3.6 (2015-12-09)

Issue

I trained a custom NER model with Stanford NER and got a serialized model. Then I tried using the model to carry out NER on unseen corpus via Python API provided by NLTK.

According to the documentation, I should specify the path to model and the path to stanford-ner.jar. However, I need to specify both the path to stanford-ner.jar and the path to slf4j-api.jar because the Stanford NER requires the logging module.

I could not figure out how to specify the two paths in the NLTK API. The constructor takes two arguments where the first one is path/to/model and the second one is path/to/jar. I tried concatenating two jar paths, putting them in a list and a tuple but none of the methods worked.

How can I tell NLTK to find both jars in order to invoke prediction?

Zelong
  • 2,476
  • 7
  • 31
  • 51

1 Answers1

1

Could you provide any code you tried? By looking at the source code from the nltk.tag.stanford module it looks like you just have to initiate the StanfordNERTagger with the model and the path to the stanford-ner JAR, and the logger is added to the classpath automatically.

This is what the __init__() method from the StanfordTagger superclass does right after setting the model and the engine. It looks for the logger JAR inside the parent folder of the stanford-ner JAR path you provide and adds it to the path implicitly by calling find_jars_within_path() from nltk.internals, which under the hood is appending the folder to os.path