Stanford Core NLP missing ROOTs

Question

From online demo Stanford CoreNLP with example sentence "A minimal software item that can be tested in isolation" it gives Collapsed dependencies with CC processed as following:

root ( ROOT-0 , item-4 )
det ( item-4 , A-1 )
amod ( item-4 , minimal-2 )
nn ( item-4 , software-3 )
nsubjpass ( tested-8 , that-5 )
aux ( tested-8 , can-6 )
auxpass ( tested-8 , be-7 )
rcmod ( item-4 , tested-8 )
prep_in ( tested-8 , isolation-10 )

From my Java class I get the same except root(...). The code I am running is as following:

public static void main(String[] args)
    {
        Properties props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        Annotation document = new Annotation(args[0]);

        pipeline.annotate(document);

        List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);

        for (CoreMap sentence : sentences) {
            SemanticGraph dependencies = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
            System.out.println(dependencies.toList());
        }
    }

So the question is why my Java code doesnt output root`s!? Am I missing something?

score 3 · Accepted Answer · answered May 01 '13 at 05:17

This is a good question, in the sense that it exposes a badness in the current code. At present, a root node and an edge from it are not stored in the graph.* Instead, they have to be accessed separately as a root/list of roots of the graph, stored as a separate list. Here are two things that will work: (1) Add this code above the System.out.println:

IndexedWord root = dependencies.getFirstRoot();
System.out.printf("ROOT(root-0, %s-%d)%n", root.word(), root.index());

(2) Use instead of your current line:

System.out.println(dependencies.toString("readable"));

Unlike the other toList() or toString() methods, it does print the root(s).

*There are historical reasons for this: We used to not have any explicit root. But at this point the behavior is awkward and dysfunctional and should be changed. It'll probably happen in a future release.

I managed to find other solution for my case: `GrammaticalStructure gs = gsf.newGrammaticalStructure(tree);` `Collection tdl = gs.typedDependenciesCCprocessed();` — werd, May 01 '13 at 22:06
Yes, that works well, since the ROOT really is in that collection of dependencies. The minor cost is that you are paying for them to be generated a second time from the parse tree. — Christopher Manning, May 02 '13 at 22:43

Stanford Core NLP missing ROOTs

1 Answers1