I have started learning GATE application and I would like to use it to extract information from an unstructured document. The information I am interested in are date, location, event information and person’s names. I would like to get information about events that happened at a specific location on a specific date and the person/s name. I have been reading the GATE manual and thats how I got the glimpse on how to build your pipeline. However, I am not figuring out how I can create my new annotation types and make sure that they are annotated to a new annotation set which should appear under the annotation sets on the right. I found similar questions like GATE - How to create a new annotation SET? but it didn help me either.
Let me explain what I did so far:
- Created .lst file for my new NE and put them under ANNIE resources/gazetteer directory
- I added the .lst file description in the list.def file
I identified my patterns in the document e.g for Date formats like ddmm, dd.mm.yyyy
I wrote JAPE rule for each pattern in a separate .jape file
- Added the JAPE file names into the main.jape file
- Loaded the PR and my document into GATE
- Run the application
This is how my JAPE Rule looks like for one date format:
Phase: datesearching
Input: Token Lookup SpaceToken
Options: control = appelt
////////////////////////////////////Macros
//Initialization of regular expressions
Macro: DAY_ONE
({Token.kind == number,Token.category==CD, Token.length == "1"})
Macro: C
({Token.kind == number,Token.category==CD, Token.length == "2"})
Macro: YEAR
({Token.kind == number,Token.category==CD, Token.length == "4"})
Macro: MONTH
({Lookup.majorType=="Month"})
Rule: ddmmyyydash
(
(DAY_ONE|DAY_TWO)
({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
(MONTH)
({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
(YEAR)
)
:ddmmyyyydash
-->
:ddmmyyyydash.DateMonthYearDash= {rule = "ddmmyyyydash"}
Can someone please help me with what I should do to make sure that DateMonthYearDash is created as a new annotation set? How do I do it? Thanks a lot.
When I change the outputAsName of the Jape Transducer the new set is not appearing like the rest. This is how it looks: