How can I possibly retrieve the annotated texts from the document in a structured way as below. I am using a sentence as a unit of processing, meaning that I would like to retrieve specific texts from the sentences and put them together later. So, I have already setup my annotation in GATE and saved the annotated results as inline xml.
So my input xml file looks like this:
<Document>
<Paragraph>
<text id="100">30.03. Zeraua joins the Otjimbingwe and Omaruru Ovaherero at Samuel’s station at Ongandjira in the upper Swakop valley.</text>
<text id="101">01.04. Von Glasenapp’s unit proceeds in the direction of Otjikuoko without meeting the Tjetjo community.</text>
<text id="102">09.04. The battle of Ongandjira is fought with heavy losses on both sides. The Ovaherero have to give way before a sustained German artillery bombardment commences, and they escape in a northerly direction.</text>
</Paragraph>
<Paragraph>
<text id="200">30.03. Zeraua joins the Otjimbingwe and Omaruru Ovaherero at Samuel’s station at Ongandjira in the upper Swakop valley.</text>
<text id="201">01.04. Von Glasenapp’s unit proceeds in the direction of Otjikuoko without meeting the Tjetjo community.</text>
<text id="202">09.04. The battle of Ongandjira is fought with heavy losses on both sides. The Ovaherero have to give way before a sustained German artillery bombardment commences, and they escape in a northerly direction.</text>
</Paragraph>
</Document>
And this is my desired output structure per sentence to be:
<text id="100">
<Event>Battle of Ongandjira</Event>
<Location>Ongandjira</Location>
<NumberDate>30.03</NumberDate>
<Person>Zeraua</Person>
</text>
And this is my annotations in GATE:
My inline file just contain a lot of mixed up annotations and I cant figure out how to structure it in that order. I have tried the Format_Twitter JSON and its a mess too.
Thanks a lot.