2

I have an issue with Sentence Splitter module in GATE. My text is something like this:

Social history. He drank a lot in his young age. He did
not attend a school. He was depressed of his condition.

While we are sure that the sentences should be splitted like

Sentence 1: Social history.
Sentence 2: He drank a lot in his young age.
Sentence 3: He did not attend a school.
Sentence 4: He was depressed of his condition.

The ANNIE Sentence Splitter recognises that the text in different lines should be grouped in different sentences, thus results this:

Sentence 1: Social history.
Sentence 2: He drank a lot in his young age.
Sentence 3: He did 
Sentence 4: not attend a school.
Sentence 5: He was depressed of his condition.

That is because the sentence is separated in multiple lines. Is there a way to tell the sentence splitter that the sentence might be comes in more than one line? Or is there any better method to recognise sentences in such type of text?

Thank you :)

A. U.
  • 67
  • 5
  • You may be passing single line to sentence splitter. You should read complete file first and pass complete text to sentence splitter. – RAVI Aug 11 '16 at 12:58
  • Actually I am using the GATE Developer, so I think I pass all the sentences at once @RAVI – A. U. Aug 11 '16 at 14:45

1 Answers1

6

Try using RegEx Sentence Splitter instead of Annie.

With the ANNIE Sentence Splitter, you have the parameter TransducerURL which by default points to something like:

/PATH-TO-GATE/plugins/ANNIE/resources/sentenceSplitter/grammar/main-single-nl.jape

In this folder there is also a jape file called:

/PATH-TO-GATE/plugins/ANNIE/resources/sentenceSplitter/grammar/main.jape

If you change it it should work.

Yasen
  • 1,663
  • 10
  • 17
  • Thank you, actually it is documented on the website, but my bad, I didn't check it. I tried to use the ways you mentioned, it works! But another issue arises, some of the lines are not closed with a full stop, and the sentence splitter overclows it to the next line. So I guess I have to decide which one gives better advantage and little drawbacks. – A. U. Aug 14 '16 at 14:22
  • If it's a problem for you, you can try to edit the rule files. Maybe you'll figure a way to catch the special cases :) – Yasen Aug 14 '16 at 20:39