2

I'm processing a simple sentence to test Stanford's RelationExtractor:

Microsoft is based in New York.

(it's not)

When I'm annotating the sentence in Java, by directly using the CoreNLP jar files I get the wanted result - CoreNLP finds a OrgBased_In relation between Microsoft and New York.

for (CoreMap sentence : sentences) {
    relationType = sentence.get(MachineReadingAnnotations.RelationMentionsAnnotation.class).get(0).type // => OrgBased_In
}

However, sending the same sentence into the CoreNLP Server like so:

curl --data 'Microsoft is based in New York.' 'http://localhost:9000/?properties={%22annotators%22%3A%22tokenize%2Cssplit%2Cpos%2Clemma%2Cner%2Cparse%2Cdepparse%2Crelation%22%2C%22outputFormat%22%3A%22json%22}' -o -

Results in a json response that contains no data on relations whatsoever:

{'sentences': [{'basicDependencies': [{'dep': 'ROOT',
                                   'dependent': 3,
                                   'dependentGloss': 'based',
                                   'governor': 0,
                                   'governorGloss': 'ROOT'},
                                  {'dep': 'nsubjpass',
                                   'dependent': 1,
                                   'dependentGloss': 'Microsoft',
                                   'governor': 3,
                                   'governorGloss': 'based'},
                                  {'dep': 'auxpass',
                                   'dependent': 2,
                                   'dependentGloss': 'is',
                                   'governor': 3,
                                   'governorGloss': 'based'},
                                  {'dep': 'case',
                                   'dependent': 4,
                                   'dependentGloss': 'in',
                                   'governor': 6,
                                   'governorGloss': 'York'},
                                  {'dep': 'compound',
                                   'dependent': 5,
                                   'dependentGloss': 'New',
                                   'governor': 6,
                                   'governorGloss': 'York'},
                                  {'dep': 'nmod',
                                   'dependent': 6,
                                   'dependentGloss': 'York',
                                   'governor': 3,
                                   'governorGloss': 'based'},
                                  {'dep': 'punct',
                                   'dependent': 7,
                                   'dependentGloss': '.',
                                   'governor': 3,
                                   'governorGloss': 'based'}],
            'enhancedDependencies': [{'dep': 'ROOT',
                                      'dependent': 3,
                                      'dependentGloss': 'based',
                                      'governor': 0,
                                      'governorGloss': 'ROOT'},
                                     {'dep': 'nsubjpass',
                                      'dependent': 1,
                                      'dependentGloss': 'Microsoft',
                                      'governor': 3,
                                      'governorGloss': 'based'},
                                     {'dep': 'auxpass',
                                      'dependent': 2,
                                      'dependentGloss': 'is',
                                      'governor': 3,
                                      'governorGloss': 'based'},
                                     {'dep': 'case',
                                      'dependent': 4,
                                      'dependentGloss': 'in',
                                      'governor': 6,
                                      'governorGloss': 'York'},
                                     {'dep': 'compound',
                                      'dependent': 5,
                                      'dependentGloss': 'New',
                                      'governor': 6,
                                      'governorGloss': 'York'},
                                     {'dep': 'nmod:in',
                                      'dependent': 6,
                                      'dependentGloss': 'York',
                                      'governor': 3,
                                      'governorGloss': 'based'},
                                     {'dep': 'punct',
                                      'dependent': 7,
                                      'dependentGloss': '.',
                                      'governor': 3,
                                      'governorGloss': 'based'}],
            'enhancedPlusPlusDependencies': [{'dep': 'ROOT',
                                              'dependent': 3,
                                              'dependentGloss': 'based',
                                              'governor': 0,
                                              'governorGloss': 'ROOT'},
                                             {'dep': 'nsubjpass',
                                              'dependent': 1,
                                              'dependentGloss': 'Microsoft',
                                              'governor': 3,
                                              'governorGloss': 'based'},
                                             {'dep': 'auxpass',
                                              'dependent': 2,
                                              'dependentGloss': 'is',
                                              'governor': 3,
                                              'governorGloss': 'based'},
                                             {'dep': 'case',
                                              'dependent': 4,
                                              'dependentGloss': 'in',
                                              'governor': 6,
                                              'governorGloss': 'York'},
                                             {'dep': 'compound',
                                              'dependent': 5,
                                              'dependentGloss': 'New',
                                              'governor': 6,
                                              'governorGloss': 'York'},
                                             {'dep': 'nmod:in',
                                              'dependent': 6,
                                              'dependentGloss': 'York',
                                              'governor': 3,
                                              'governorGloss': 'based'},
                                             {'dep': 'punct',
                                              'dependent': 7,
                                              'dependentGloss': '.',
                                              'governor': 3,
                                              'governorGloss': 'based'}],
            'index': 0,
            'parse': '(ROOT\n'
                     '  (S\n'
                     '    (NP (NNP Microsoft))\n'
                     '    (VP (VBZ is)\n'
                     '      (VP (VBN based)\n'
                     '        (PP (IN in)\n'
                     '          (NP (NNP New) (NNP York)))))\n'
                     '    (. .)))',
            'tokens': [{'after': ' ',
                        'before': '',
                        'characterOffsetBegin': 0,
                        'characterOffsetEnd': 9,
                        'index': 1,
                        'lemma': 'Microsoft',
                        'ner': 'ORGANIZATION',
                        'originalText': 'Microsoft',
                        'pos': 'NNP',
                        'word': 'Microsoft'},
                       {'after': ' ',
                        'before': ' ',
                        'characterOffsetBegin': 10,
                        'characterOffsetEnd': 12,
                        'index': 2,
                        'lemma': 'be',
                        'ner': 'O',
                        'originalText': 'is',
                        'pos': 'VBZ',
                        'word': 'is'},
                       {'after': ' ',
                        'before': ' ',
                        'characterOffsetBegin': 13,
                        'characterOffsetEnd': 18,
                        'index': 3,
                        'lemma': 'base',
                        'ner': 'O',
                        'originalText': 'based',
                        'pos': 'VBN',
                        'word': 'based'},
                       {'after': ' ',
                        'before': ' ',
                        'characterOffsetBegin': 19,
                        'characterOffsetEnd': 21,
                        'index': 4,
                        'lemma': 'in',
                        'ner': 'O',
                        'originalText': 'in',
                        'pos': 'IN',
                        'word': 'in'},
                       {'after': ' ',
                        'before': ' ',
                        'characterOffsetBegin': 22,
                        'characterOffsetEnd': 25,
                        'index': 5,
                        'lemma': 'New',
                        'ner': 'LOCATION',
                        'originalText': 'New',
                        'pos': 'NNP',
                        'word': 'New'},
                       {'after': '',
                        'before': ' ',
                        'characterOffsetBegin': 26,
                        'characterOffsetEnd': 30,
                        'index': 6,
                        'lemma': 'York',
                        'ner': 'LOCATION',
                        'originalText': 'York',
                        'pos': 'NNP',
                        'word': 'York'},
                       {'after': '',
                        'before': '',
                        'characterOffsetBegin': 30,
                        'characterOffsetEnd': 31,
                        'index': 7,
                        'lemma': '.',
                        'ner': 'O',
                        'originalText': '.',
                        'pos': '.',
                        'word': '.'}]}]}

I can see on the CoreNLP server terminal that the relation extraction model is loaded.

[pool-1-thread-1] INFO edu.stanford.nlp.pipeline.RelationExtractorAnnotator - Loading relation model from edu/stanford/nlp/models/supervised_relation_extractor/roth_relation_model_pipelineNER.ser

What am I missing here?

Thanks!

Simon
  • 322
  • 1
  • 13

1 Answers1

3

I think ultimately nobody added that output to the JSON for that annotator, which we can do eventually.

Right now the relation extraction we are mainly supporting is the new kbp annotator. This extracts the relations from the TAC-KBP challenge.

You can find the relation descriptions here: https://tac.nist.gov//2015/KBP/ColdStart/guidelines/TAC_KBP_2015_Slot_Descriptions_V1.0.pdf

Here is an example command that I ran:

java -Xmx8g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse,mention,entitymentions,coref,kbp -file microsoft-example.txt -outputFormat json

If you look at the JSON you'll see the proper relation has been extracted.

StanfordNLPHelp
  • 8,699
  • 1
  • 11
  • 9
  • 1
    My goal eventually is to train a relation extractor model using new relation types, so I think KBP is not what I'm looking for. I guess I'll have to roll out my own wrapper for the relation extractor. Thanks for the quick reply! – Simon Jan 17 '17 at 11:02