1

I was wondering is someone can help me out here. I think this could be of use for anyone trying to conduct machine learning on GATE (General Architecture for Text Engineering). So basically to conduct machine learning I first need to add some code to a few jape files so my output XML file would print out the Annotation Id value as a feature. An example is provided below:

<Annotation Id="1491" Type="Person" StartNode="288" EndNode="301">
<Feature>
  <Name className="java.lang.String">id</Name>
  <Value className="java.lang.String">1491</Value>
</Feature>

(Note that the feature value of 1491 matches the Annotation Id="1491". This is what I want.)

WHY I NEED THIS: I am conducting machine learning on a plain text document that initially contains no annotation. I am using the June 2012 training course that is on the GATE website as a guide while doing this. I am specifically following the Module 11: Relations tutorial (it finds employment relationships between person and organization). I utilize the corpus of 93 pre-annotated documents for training and then apply that learned module on my document. But first I run my document through ANNIE. It creates many annotations and features but not everything that I need for machine learning. I've learned through trial/error and investigation that my annotated document must contain features with the Annotation Id for every "Person" and "Organization" type. I recognize that the configuration file (relations-config.xml) that is used in the Batch Learning PR looks for id features for "Person" and "Organization" types. It will not run if these ID features are not present. So I add this manually and then run it through the machine learning "APPLICATION" mode. It works rather nicely. However I clearly do not want to add the id features to my XML file manually every time.

WHAT I HAVE FIGURED OUT WITH THE GATE CODE: I believe I have found the code files (final.jape, org_context.jape and name_context.jape) that I need to alter so they can add that id feature to every annotation that contains "Person" and "Organization". I don't understand the language that GATE uses very well (I'm a mechanical engineer, not a software engineer) and this is probably why I can't figure this out (Ha!). Anyhow, I could be off and may need to add a few more lines in for the jape file to work properly, but I feel like I've pinpointed it pretty closely. There are two sections of code that are similar but slightly different, which are currently the bane of my existence. The first one goes through an iterator loop, the second one does not. I copy/pasted those 2 those below with a line stating WHAT_DO_I_PUT_HERE that indicate where I think my problem and solution lies. I would be very grateful if someone can help me with what I need to write to get my result.

Thank you! - Todd

//////////// First section of code ////////////////

Rule: PersonFinal
Priority: 30
//({JobTitle}
//)?
(
 {TempPerson.kind == personName}
)
:person
--> 
{
gate.FeatureMap features = Factory.newFeatureMap();
gate.AnnotationSet personSet = (gate.AnnotationSet)bindings.get("person");
gate.Annotation person1Ann = (gate.Annotation)personSet.iterator().next();


gate.AnnotationSet firstPerson = (gate.AnnotationSet)personSet.get("TempPerson");
if (firstPerson != null && firstPerson.size()>0)
{
  gate.Annotation personAnn = (gate.Annotation)firstPerson.iterator().next();
  if (personAnn.getFeatures().containsKey("gender")) features.put("gender", personAnn.getFeatures().get("gender"));
}
  features.put("id", WHAT_DO_I_PUT_HERE.getId().toString());
  features.put("rule1", person1Ann.getFeatures().get("rule"));
  features.put("rule", "PersonFinal");
outputAS.add(personSet.firstNode(), personSet.lastNode(), "Person", features);
outputAS.removeAll(personSet);
}

//////////// Second section of code ////////////////

Rule:OrgContext1
Priority: 1
// company X
// company called X

(
 {Token.string == "company"}
 (({Token.string == "called"}|
   {Token.string == "dubbed"}|
   {Token.string == "named"}
  )
 )?
)
( 
 {Unknown.kind == PN}
)
:org
-->
{
gate.AnnotationSet org = (gate.AnnotationSet) bindings.get("org");
gate.FeatureMap features = Factory.newFeatureMap();
features.put("id", WHAT_DO_I_PUT_HERE.getId().toString());
features.put("rule ", "OrgContext1");
outputAS.add(org.firstNode(), org.lastNode(), "Organization", features);
outputAS.removeAll(org);
}

2 Answers2

2

You cannot access the annotation id before the actual annotation is created. My solution of this problem:

Rule:PojemId
(
 {PojemD}
):pojem
--> 
{
    AnnotationSet matchedAnns = bindings.get("pojem");  
    Annotation ann = matchedAnns.get("PojemD").iterator().next();

    FeatureMap pojemFeatures = ann.getFeatures();
    gate.FeatureMap features = Factory.newFeatureMap();
    features.putAll(pojemFeatures);
    features.put("annId", ann.getId()); 

    inputAS.remove(ann); 
    Integer id = outputAS.add(matchedAnns.firstNode(), matchedAnns.lastNode(), "PojemD", features);  
    features.put("id", id); 
}
isixtova
  • 29
  • 4
1

It's quite simple. You have to mark the annotation on the Right Hand Side (RHS) of the rule by some label (token_match in my example bellow) and then, on the Left Hand Side (LHS) of the rule, just obtain corresponding AnnotationSet form bindings variable and iterate through annotations (usually there is only a single annotation in it) and copy corresponding IDs to the output.

Phase: Main
Input: Token 

Rule: WriteTokenID
(
  ({Token}):token_match
)    
-->
{
  AnnotationSet as = bindings.get("token_match");
  for (Annotation a : as) 
  {
    FeatureMap features = Factory.newFeatureMap();
    features.put("origTokenId", a.getId());
    outputAS.add(a.getStartNode(), a.getEndNode(), "NewToken", features);   
  }
}

In your code, you probably want to mark {TempPerson.kind == personName}and {Unknown.kind == PN} somehow like bellow.

(
 ({TempPerson.kind == personName}):temp_person
)
:person

and

(
 {Token.string == "company"}
 (({Token.string == "called"}|
   {Token.string == "dubbed"}|
   {Token.string == "named"}
  )
 )?
)
( 
 ({Unknown.kind == PN}):unknown_org
)
:org

And them use bindings.get("temp_person") and bindings.get("unknown_org") respectively.

dedek
  • 7,981
  • 3
  • 38
  • 68