1

We are using ANNIE plugin "Document Normalizer" to remove stopwords and other punctuation symbols, later we call Gazetteer to process the normilized text and in the last step, we need some plugin to recover original text/position for each Annotation.

How can we achieve that?

Thanks

Valijon
  • 12,667
  • 4
  • 34
  • 67

1 Answers1

0

Document Normalizer is not designed to remove words but to replace one character by another one. A typical case is when a tagger wasn't trained with some non ASCII punctuations. See http://gate.ac.uk/userguide/sec:misc-creole:doc-normalizer

To ignore stop words you should use a gazetteer to annotate them. Then in your Jape rules you can skip them with a negative rule like {Token.category == NN, !Lookup.majorType == stop}.

It will be a lot more flexible and some stop words are relevant only in certain case so you may want to have them otherwise.

Thomas
  • 1