1

I have been searching for a solution to this for weeks, I have some documents(about 95) that I am trying to classify using GATE. I have put them in one corpus I called training_corpus, however, after ANNIE has annotated the corpus, I have to go back into each file, select all token in the document, and create an annotation called Mention, with feature type and value the class for the document. for example:

type Start End id Features
Mention 0 70000 2588 {type=neg}

Is there anyway to automatically do this with JAPE? Basically, I want to select all tokens and create a new annotation with feature(type=class). Also, the class is appended to the document. Since there are many documents, can JAPE extract the class from the document name and set it to the value of Mentions feature. Example document name is neg_data1.txt, so the annotation will be Mention.type = neg?

Any help will be greatly appreciated. Thanks

tigg
  • 107
  • 8

1 Answers1

2

I think you answered to your question by yourself.If the class assignment based on just a token present in text - why not simply process text outside of GATE? For example to create an xml file like: text and then use it in training process. Also you can create a simple JAPE rule which will: a) will take a text within document boundaries (see gate.Utils.length methods AFAIR) b) based on presence of your token will create a new Annotation instance with features necessary. an abstract example:

Phase: Instance
Input: Token
Options: control = once

Rule:Instance
(
  {Token}
):instance
-->
{
   AnnotationSet instances = outputAS.get("INSTANCE_ANNOTATION");
    FeatureMap featureMap = Factory.newFeatureMap();
    if (instances!=null&&!instances.isEmpty()){
       featureMap.put("features when annotation presented in doc");
     }else{
       featureMap.put("features when annotation not in doc");
     }
    outputAS.add(new Long(0), new Long(documentLength), "Mention", featureMap);

}
ashingel
  • 494
  • 3
  • 11
  • 1
    thanks, i created an xml and used the marked up annotations as my annotation set and it worked. thanks alot. However, I have quick short question, how can load many files(Gate documents) all at once in GATE GUI. thanks – tigg Apr 03 '14 at 01:12
  • 1
    @user3183103 to load all documents at once you can create a corpus, then R-click on it and click "Populate" in pop-up menu. Perhaps for large collections of documents that would ovecome your memory limits you will consider using GATE Datastore. – andrey Apr 03 '14 at 09:10
  • 1
    @andrey you are awesome. great thanks soo much and thanks to ashingel for answering my question. Thanks a bunch – tigg Apr 04 '14 at 00:15
  • I want to upvote this question and answer 100 times. Extremely useful. Thanks. – pnv Feb 16 '15 at 11:19