-3

What is the logic used behind sentence detection class in OpenNLP API? Is it:

  • detect on basis on "." or
  • longest white space trimmed character sequence or
  • something else.

Could somebody explain this?

Also: How parsing is done in Parsing API, i.e., what is the logic used?

MWiesner
  • 8,868
  • 11
  • 36
  • 70
anamika
  • 9
  • 1

1 Answers1

3

The official OpenNLP documentation (chapter 2) should give you a basic understanding. It states:

The OpenNLP Sentence Detector can detect that a punctuation character marks the end of a sentence or not. In this sense a sentence is defined as the longest white space trimmed character sequence between two punctuation marks. The first and last sentence make an exception to this rule. The first non whitespace character is assumed to be the begin of a sentence, and the last non whitespace character is assumed to be a sentence end. The sample text below should be segmented into its sentences....

Internally, OpenNLP uses pre-trained models for that. These models are available for different languages and cover a broad range of linguistic characteristics.

However, it is possible to train your "own" models that might better fit your text material you want to feed into the sentence detector. The corresponding section in the OpenNLP and the associated JavaDoc page should guide you.

If you are interested deeper into the parsing process, you could also have a read at this StackOverflow question and related answers, as they discuss ParserModel and how to use related classes.

Hope it helps.

Community
  • 1
  • 1
MWiesner
  • 8,868
  • 11
  • 36
  • 70