Looking to Reason / Extract Information from Entity and Part of Speech Tagged Text

Question

Let us say I start with the following text:

I love Toyota Camrys and hate Ferraris

I use a POS tagger like Stanford CoreNLP and get the following Annotations:

I_PRP love_VBP Toyota_NNP Camrys_NNPS and_CC hate_VB Ferraris_NNP

Let us assume I have a Named Entity Recognizer and am able to identify a Camry and Ferrari from the above notation.

I want to be able to reason about the above sentence where for example I deduce the following:

I hate Camrys
I love Ferraris

possibly even:

I hate something manufactured by Toyota
I hate something manufactured by Ferrari

I am currently doing the above using manually coded heuristics and slot matching.

Question: Is there a more standard way to accomplish this?

For example I ran in to JAPE Java Annotation Patterns Engine from Gate -- is that part of the tool chain do something like this.

@ChthonicProject, -- I guess I should add it to my reading list. — user1172468, Jun 10 '14 at 17:48

score 3 · Accepted Answer · answered Jun 10 '14 at 16:45

There are 2 ways to do that:

1) Write your own JAPE grammars. This is not as hard as it appears to be. There are many JAPE manuals on the web. First google link for "gate jape manual" seem to be ok for startup. Additionally, existing JAPE grammars from GATE ANNIE can provide good examples and ideas for your task.

At the beginning you would try to create your own dictionary for GATE Gazetteer with entries for names of brands (Toyota, Ferrary, e.t.c.) to create "Lookup" annotations. Then your JAPE rules would contain rules like

Rule: LoveBrand ( {Token.kind == word, Token.string = "I"} {Token.kind == word, Token.string = "love"} {Lookup.majorType == "brand"} ): label --> :label.Prefererence = {rule= "LoveBrand" }

2) Use Parser_Stanford plugin in GATE. It will create two types of annotations for Dependencies and TreeNodes. Dependencies are typed links between couples of words, TreeNodes are dependencies collapsed into trees. Just try to play with Parser_Stanford plugin in GATE Developer GUI and you will get idea how to use it for your task.

You can process your "I love Toyota Camrys and hate Ferraris." on this demo page to see what Stanford parser can do. Particularly you need dependencies of type dobj. There is a Stanford dependencies manual with description of all possible dependencies if you want use other Stanford dependencies.

Parser_Stanford plugin for GATE just adds annotations for Stanford dependencies to your document. You can add GATE transducer processing resource with your JAPE grammars and add it to your sequence of processing resources in GATE Developer after Parser_Stanford to process annotations created for Stanford dependencies.

may thanks for your detailed response. Your response essentially says go the GATE/JAPE route. Are there any competing approaches? Other open source tools? — user1172468, Jun 10 '14 at 17:50
@user1172468 - you can take a look on UIMA (http://uima.apache.org), OpenNLP (http://opennlp.apache.org) and NLTK (http://www.nltk.org) projects. — andrey, Jun 11 '14 at 08:14
thanks have looked at opennlp and ntlk but not at uima. Thanks — user1172468, Jun 11 '14 at 16:32

Looking to Reason / Extract Information from Entity and Part of Speech Tagged Text

1 Answers1