I was checking out Stanford CoreNLP in order to understand NER and POS tagging. But what if I want to create custom tags for entities like<title>Nights</title>, <genre>Jazz</genre>, <year>1992</year>
How can I do it? is CoreNLP useful in this case?

- 2,083
- 8
- 30
- 45
-
Yes, CoreNLP can use custom "tags". "Year" should already be marked as a DATE. An easy way is to use the gazette feature. You need to read the docs carefully, several times. – Neil McGuigan Jan 27 '14 at 19:49
2 Answers
CoreNLP out-of-the-box will be restricted to types they mention : PERSON, LOCATION, ORGANIZATION, MISC, DATE, TIME, MONEY, NUMBER. No, you won't be able to recognize other entities just by assuming it could "intuitively" do it :)
In practice, you'll have to choose, either:
- Find another NER systems that tags those types
- Address this tagging task using knowledge-based / unsupervised approaches.
- Search for extra resources (corpora) that contain types you want recognize, and re-train a supervised NER system (CoreNLP or other)
- Build (and possibly annotate) your own resources - then you'll have to define an annotation scheme, rules, etc. - quite an interesting part of the work!
Indeed, unless you find an existing system that fulfills your needs, some effort will be required! Unsupervised approaches may help you bootstrapping a system, so as to see if you need to find / annotate a dedicated corpus. In the latter case, it would be better to separate data as train/dev/test parts, so as to be able to assess how much the resulting system performs on unseen data.

- 700
- 6
- 14
Look into this FAQ (http://nlp.stanford.edu/software/crf-faq.shtml) to use CRF classifier to train your model for new classes. You may find it useful.

- 379
- 4
- 4