-1

I am trying to detect Sentences using GATE and more specifically using either ANNIE SentenceSplitter or RegexSentenceSplitter.

RegexSentenceSplitter seems to be working very well, however the only problem is that a new sentence annotation is being created at the beginning of each new page of the document. (The documents analysed are PDFs).

Is it possible to change this behavior of the RegexSentenceSplitter?

Harry Wells
  • 169
  • 1
  • 1
  • 7

1 Answers1

1

You can probably try to use a conditional corpus pipeline. This method allows you to run PR (here the RegExSentenceSplitter) or not according to the value of a feature on the document.

More details here: https://gate.ac.uk/sale/tao/splitch3.html#x6-480003.8.2