OpenNLP NLP TOOL for keyword extraction

Question

I want extract keywords/tags from a set of documents (pdf, docx, txt) using opennlp API for tagging purpose.

Can anyone suggest how I can make use of the opennlp tool for keyword extraction purpuse?

score 1 · Answer 1 · edited Oct 25 '17 at 13:08

Welcome to SO! If you think of a "keyword" as a relative term, then OpenNLP can help you in many ways. For instance, you can use the part of speech tagger to extract nouns, and only index the nouns as keywords (you could do the same for verbs). You could use the SentenceChunker, and extract noun phrases or verb phrases and index the phrases. You could perform Named Entity Recognition with the Namefinder and index the entities by type (then your search engine could enable searching specifically on people's names or the names of organizations. This can be powerful depending on your use case. In order to get the text out of the pdf and doc/docx you should think about using Tika.

Here are some links to other SO question

also, if you are using SOLR, I think some work has been done to utilize OpenNLP as a tokenizer... never used it though.

OpenNLP NLP TOOL for keyword extraction

1 Answers1