I am trying to do document classification with gate. For that I need to annotate the entire document with one type of annotation. Can anyone please tell me how to do that?
Asked
Active
Viewed 333 times
2 Answers
1
Usually I use XML for that purpose. Something like:
<document class="class-1">
The text of you document 1 is here..
</document>
<document class="class-2">
The text of you document 2 is here..
</document>
Then save these xml as separated files (or as one document).
In GATE application you can use Annotation Set Transfer PR and move annotation from "Original markups" to default annotation set. This is one of the options. Other options depends on data format you have.

ashingel
- 494
- 3
- 11
1
If your source documents are HTML or XML then there will already be an annotation in the Original markups set that spans all the content, otherwise the simplest option would be to load the Groovy plugin and use the scripting PR with a one-line script like
outputAS.add(doc.start(), doc.end(), "Document", Utils.featureMap())

Ian Roberts
- 120,891
- 16
- 170
- 183