I want to remove stopwords from a given text with GATE. Therefore I use a Tokenizer and a Gazetteer: The Gazetteer returns me the stopwords which I want to delete. I think there is no GATE plugin for deleting words, isn't it? So I want to do it with a groovy script, but I don't know how: I think I should be able to receive the position of the stopwords from the Gazetteer.
And I know there is the method edit()
, but it doesn't work as expected:
Long start = //startPosition of a stopwords
Long end = //endPosition of a stopwords
doc.edit(start, end, DocumentContentImpl(""))
Last line throws an exception and I couldn't figure out how to use edit() correctly - or rather what else I can do to remove stopwords.