1

I have started learning GATE application and I would like to use it to extract information from an unstructured document. The information I am interested in are date, location, event information and person’s names. I would like to get information about events that happened at a specific location on a specific date and the person/s name. I have been reading the GATE manual and thats how I got the glimpse on how to build your pipeline. However, I am not figuring out how I can create my new annotation types and make sure that they are annotated to a new annotation set which should appear under the annotation sets on the right. I found similar questions like GATE - How to create a new annotation SET? but it didn help me either.

Let me explain what I did so far:

  1. Created .lst file for my new NE and put them under ANNIE resources/gazetteer directory
  2. I added the .lst file description in the list.def file
  3. I identified my patterns in the document e.g for Date formats like ddmm, dd.mm.yyyy

  4. I wrote JAPE rule for each pattern in a separate .jape file

  5. Added the JAPE file names into the main.jape file
  6. Loaded the PR and my document into GATE
  7. Run the application

This is how my JAPE Rule looks like for one date format:

    Phase: datesearching
    Input: Token Lookup SpaceToken
    Options: control = appelt

    ////////////////////////////////////Macros
    //Initialization of regular expressions
    Macro: DAY_ONE
    ({Token.kind == number,Token.category==CD, Token.length == "1"})

    Macro: C
    ({Token.kind == number,Token.category==CD, Token.length == "2"})

    Macro: YEAR
    ({Token.kind == number,Token.category==CD, Token.length == "4"})

    Macro: MONTH
    ({Lookup.majorType=="Month"})

    Rule: ddmmyyydash
    (
        (DAY_ONE|DAY_TWO)
        ({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
        (MONTH)
        ({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
        (YEAR)
    )
    :ddmmyyyydash
    -->
        :ddmmyyyydash.DateMonthYearDash= {rule = "ddmmyyyydash"}

Can someone please help me with what I should do to make sure that DateMonthYearDash is created as a new annotation set? How do I do it? Thanks a lot.

When I change the outputAsName of the Jape Transducer the new set is not appearing like the rest. This is how it looks:

annotation set list

1 Answers1

0

As said, linked or quoted in the question you mention (GATE - How to create a new annotation SET?), you have two options:

  1. Change the outputASName of your JAPE transducer PR.
  2. Use Annotation Set Transfer PR to copy or move desired annotations from one annotation set to another one.

JAPE function - explanation

JAPE transducer (similarly to many other GATE PRs) simply takes some input annotations and based on them it creates some new output annotations. The input and output annotation sets names can be configured by inputASName and outputASName run-time parameters. inputASName says where it should look for input annotations and outputASName says where it should put output annotations to.

What should be where

The input annotation set must contain the necessary input annotations before the JAPE transducer PR is executed. These annotations are usually created by preceding PRs in the pipeline. Otherwise it will not see the necessary input annotations and it will not produce anything.

The output annotation set may be empty or it may contain anything before the JAPE execution. It doesn't matter. The import thing is that the new output annotations (DateMonthYearDash in your case) are created there when the JAPE transducer PR execution finished.
So after successful JAPE execution you should see the new annotations there.

Some terminology

Note that annotation sets have names.
While annotations have type, id, offsets, features and annotation set they belong to.


JAPE correction

I found some issues in your JAPE grammar:

  1. Don't include SpaceToken unless you explicitly use them in your grammar or you are sure there will be none inside the pattern... See also: Concept of Space Token in JAPE
  2. ({Lookup.majorType=="Month"}) -> ({Lookup.minorType=="month"})
  3. (DAY_ONE|DAY_TWO) -> (DAY_ONE)

After corrections + after ANNIE pipeline for document 9 - January - 2017: GATE doc output

JAPE grammar after corrections:

Phase: datesearching
    Input: Token Lookup
    Options: control = appelt

    Macro: DAY_ONE
    ({Token.kind == number,Token.category==CD, Token.length == "1"})

    Macro: YEAR
    ({Token.kind == number,Token.category==CD, Token.length == "4"})

    Macro: MONTH
    ({Lookup.minorType=="month"})

    Rule: ddmmyyydash
    (
        (DAY_ONE)
        ({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
        (MONTH)
        ({Token.string == ","}|{Token.string == "."} |{Token.string == "-"})
        (YEAR)
    )
    :ddmmyyyydash
    -->
        :ddmmyyyydash.DateMonthYearDash= {rule = "ddmmyyyydash"}

What to do when JAPE does not produce anything

You have to investigate the input annotations and "debug" your JAPE grammar. Usually there is some expected input annotation missing or there is some extra annotation you did not expect to be there. There is a nice view in GATE for this purpose: annotation stack. Also some features of input annotations can have different name or value than you expected (e.g. What is correct: {Lookup.majorType=="Month"} or {Lookup.minorType=="month"}?).

By "debugging" a JAPE grammar I mean: try to simplify the rule as far as it starts working. Keep trying it on a simple document where it should match for sure. So in your case you can try it without the (DAY_ONE) part. If it still doesn't work, try only (MONTH)({Token.string == "-"})(YEAR), or even (MONTH) only, etc. Until you find the mistake in the grammar...

Community
  • 1
  • 1
dedek
  • 7,981
  • 3
  • 38
  • 68
  • I have changed the outputAsName but its not appearing like the rest of the sets. I edited that question to add the screenshot. – Nampa Gwakondo Jul 24 '17 at 10:51
  • @NampaGwakondo: on the screenshot, the `monthdateyear` set is **empty**. Dit the JAPE find something in this document? How des it look with the default outputASName? – dedek Jul 24 '17 at 13:57
  • I set the outputASName to monthdateyear. Or what should I set the outputAsName to and what should be in the monthdateyear set? This is confusing me.. – Nampa Gwakondo Jul 25 '17 at 10:19
  • @NampaGwakondo see my edits... and can you compare your results with different setting of outputASName? Does it work better with either setting (according to you)? Have you seen the `DateMonthYearDash` annotation anywhere? – dedek Jul 25 '17 at 11:38
  • thanks for the clear explanation. I edited my jape rule like you suggested but it still just inactive like on my screenshot. What could be the problem really? – Nampa Gwakondo Jul 25 '17 at 15:30
  • Sorry for the confusion between MonthDateYear and DateMonthYearDash, they are all different date patterns am trying to annotate. Even for just DateMonthYearDash its inactive. – Nampa Gwakondo Jul 25 '17 at 16:02
  • I actually came back to your comments again when it wasn working and I decided to just keep trying and do it over... It works and am happy to have all my date patterns now. I changed to ' {Lookup.minorType=="month"} ' and everything is fine. Thank you! – Nampa Gwakondo Jul 26 '17 at 10:21