I am just doing the comparative study of open source NLP tools, and got an idea about the features/services of openNLP and coreNLP engines. In the recent past, I see that no contribution made for openNLP forum, where as coreNLP forum is still going active. So I wanted to understand if stanford:coreNLP has become more popular and been widely used in commercial applications? Anyone has an idea about it?
1 Answers
Apache OpenNLP is actively developed. Take a look at the commit history [1], there are commits done almost everyday by different contributors and they cut four releases this years (1.7.0, 1.7.1, 1.7.2, and just recently 1.8.0).
OpenNLP is licensed under company friendly Apache License 2.0, compared to CoreNLP which is licensed under GPL which is difficult to use in commercial software (e.g. software being distributed must be released under GPL as well), but they are selling commercial licenses.
OpenNLP is developed mostly by companies which run it in their production systems, where CoreNLP is made by a researchers at Stanford.
CoreNLP has a quite a few dependencies which are pulled into your project, where OpenNLP has zero dependencies.
OpenNLP can support you with the following tasks:
- Sentence Detection
- Tokenization
- Chunking
- Named Entity Recognition
- Pos Tagging
- Parsing
- Stemming
- Language Model
- Lemmatization
- Document classification
OpenNLP is highly customizable, easy to train on user data, has support for training on many publicly available corpora and features built-in evaluation to measure performance of every component.
CoreNLP supports these tasks:
- Sentence Detection
- Tokenization
- Named Entity Recognition
- Pos Tagging
- Parsing (also dependency parsing)
- Sentiment
- Coreference
- Lemmatization
- Relation Extraction

- 176
- 3
-
Thank you. Are there production systems/commercial applications based on openNLP? – ShreeVidhya May 25 '17 at 09:40
-
1Yes, for example "Ask Oscar", a chat bot from Air New Zealand [1] Another larger user is Apache cTakes a project which makes clinical NLP software and which is used in multiple clinics. [1] https://twitter.com/suneelmarthi/status/862699993651585024 – Joern May 25 '17 at 09:53
-
Accept the reply as answer if you like it. – Joern May 25 '17 at 10:29
-
1Apache OpenNLP is also used by National Institutes of Health (NIH) for Epidemiological Research - determining occupational risk based on the job description. This is a production application that's actively being used. See https://soccer.nci.nih.gov/soccer/ – Suneel Marthi May 25 '17 at 12:17
-
1+1 to the answer of Joern. Just want to add that, in my experience, the output quality of the coreNLP tools is usually a bit better. But training (of individual components) is much easier with OpenNLP (afaik it is impossible for most coreNLP components). So it's a matter of use case (and licensing of course, as addressed in the answer) which one suits you best. – Igor May 31 '17 at 12:36