Questions tagged [opennlp]

Apache's libraries for natural language processing (NLP).

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also included maximum entropy and perceptron based machine learning.

More about Natural Language Processing :

Natural language processing (NLP) is the ability of a computer program to understand human speech as it is spoken.

Apache OpenNlp is often used with Apache Flink(a document query library).

Relevant Links,

http://searchcontentmanagement.techtarget.com/definition/natural-language-processing-NLP https://opennlp.apache.org/docs/.

Cornerstone books: https://www.manning.com/books/taming-text

684 questions
4
votes
1 answer

Find location name using OpenNLP

I am new in OpenNLP. I use OpenNLP to find location's name from sentence. My input string is "Italy pardons US colonel in CIA case". I can not find "Italy" word in result set. How can I solve this problem. Thanks in advance! try { InputStream…
Dung TQ
  • 51
  • 1
  • 4
4
votes
1 answer

Extract clause form sentence

I want to extract subordinate clause,main clause,relative clause,restrictive relative clause,non-restrictive relative clause from sentences but I don't know how doing this work. for example: "I first saw her in Paris, where I lived in the early…
SahelSoft
  • 615
  • 2
  • 9
  • 22
4
votes
1 answer

"Found unexpected annotation while handling a name sequence"

I wanted to do my training for Named Entity Recognition functionality in OpenNLP. I wrote a piece of code according to http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.namefind I started with a trivial example…
4
votes
2 answers

PHP code to access API of a Java library

I need to use the Java-based OpenNLP library in my PHP code. For example, I need to use its Sentence Detector component (en-sent.bin) for analysing text variables in my PHP code. In its documentation, that API can be accessed from a Java code as…
Orion
  • 1,104
  • 3
  • 16
  • 40
4
votes
3 answers

Reading POS tag models in Android

I have tried doing POS tagging using openNLP POS Models on a normal Java application. Now I would like to implement it on Android platform. I am not sure what is the Android requirement or restrictions as I am not able to read the models (binary…
mellissa
  • 87
  • 1
  • 3
  • 13
4
votes
1 answer

How to realize Named entity recognition with OpenNLP for the Albanian language?

I am trying out OpenNLP for Albanian language. For this I am using OPenNLP and trying to build models for person, location and organisation entity recognition in Albanian language. I am building my self the corpus, but I need an Open NLP expert to…
4
votes
2 answers

Open NLP Name Finder Training

I am building a 15k line training data document called: en-ner-person.train per the online manual (http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html). My question is: in my training document, do I include an entire report?…
Chris
  • 18,075
  • 15
  • 59
  • 77
3
votes
2 answers

How to recognise a particular user in a long multi-user internet chat log?

Here is an online programming contest we are planning to have. What could be possible approaches to solving the same? From a random IRC (Internet Relay Chat) log, a small percentage of the user nicknames will be randomly deleted. The participant’s…
3
votes
1 answer

How to split Japanese text?

What is the best way of splitting Japanese text using Java? For Example, for the below text: こんにちは。私の名前はオバマです。私はアメリカに行く。 I need the following output: こんにちは 私の名前はオバマです 私はアメリカに行く Is it possible using Kuromoji?
din_oops
  • 698
  • 1
  • 9
  • 27
3
votes
1 answer

OpenNLP gives error when using Thai model

I have tried to follow the advice from here, but I got this error: C:\OpenNLP_models\tool\apache-opennlp-1.5.3-bin\apache-opennlp-1.5.3\bin>opennlp TokenizerME C:\OpenNLP_models\tool\apache-opennlp-1.5.3-bin\apache-opennlp-1.5.3\bin\thai.tok.bin <…
Music
  • 133
  • 1
  • 1
  • 7
3
votes
1 answer

How to prepare training data for OpenNLP to Tokenize the token that contains more than one word?

In some language (for example: Vietnamese), some vocabulary consists of multiple words. So that some tokens which contain more than one word can be tokenized not just using the white space. I have following input: Người dân địa phương đã nhiều lần…
Haha TTpro
  • 5,137
  • 6
  • 45
  • 71
3
votes
3 answers

OpenNLP Tokenizer does not detect words that belong together?

I am new to NLP and I came across OpenNLP. From my understanding tokenization means segmenting text into words and sentences. Words are often separated by white spaces but not all white spaces are equal. For example Los Angeles in an individual…
p192
  • 518
  • 1
  • 6
  • 19
3
votes
0 answers

java.lang.OutOfMemoryError: Java heap space:failed reallocation of scalar replaced objects - Error when train the custom model using opennlp api

I am getting below error when I train a custom NER model using openNlp api with more than 2 million sentences. java.lang.OutOfMemoryError: Java heap space:failed reallocation of scalar replaced objects. I tried below solution but its not working…
MAK
  • 43
  • 1
  • 7
3
votes
2 answers

How to create a gazetteer based Named Entity Recognition(NER) system?

I have tried my hands on many NER tools (OpenNLP, Stanford NER, LingPipe, Dbpedia Spotlight etc). But what has constantly evaded me is a gazetteer/dictionary based NER system where my free text is matched with a list of pre-defined entity names, and…
Vini
  • 313
  • 1
  • 7
  • 21
3
votes
0 answers

How to handle emojis correctly in OpenNLP?

For example, for this sentence, Hoseok yelled out Puma at the end The tokenized emojis becomes "????". Is this an issue for openNLP or something else?
user697911
  • 10,043
  • 25
  • 95
  • 169