1

If I take the example from the homepage:

The strongest rain ever recorded in India shut down 
the financial hub of Mumbai, snapped communication 
lines, closed airports and forced thousands of people 
to sleep in their offices or walk home during the night, 
officials said today.

The Stanford parser:

LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");

Tree parse = lexicalizedParser.parse(text);
TreePrint treePrint = new TreePrint("penn, typedDependencies");

treePrint.printTree(parse);

Delivers the follwing tree:

(ROOT
(S
  (S
    (NP
      (NP (DT The) (JJS strongest) (NN rain))
      (VP
        (ADVP (RB ever))
        (VBN recorded)
        (PP (IN in)
          (NP (NNP India)))))
    (VP
      (VP (VBD shut)
        (PRT (RP down))
        (NP
          (NP (DT the) (JJ financial) (NN hub))
          (PP (IN of)
            (NP (NNP Mumbai)))))
      (, ,)
      (VP (VBD snapped)
        (NP (NN communication) (NNS lines)))
      (, ,)
      (VP (VBD closed)
        (NP (NNS airports)))
      (CC and)
      (VP (VBD forced)
        (NP
          (NP (NNS thousands))
          (PP (IN of)
            (NP (NNS people))))
        (S
          (VP (TO to)
            (VP
              (VP (VB sleep)
                (PP (IN in)
                  (NP (PRP$ their) (NNS offices))))
              (CC or)
              (VP (VB walk)
                (NP (NN home))
                (PP (IN during)
                  (NP (DT the) (NN night))))))))))
  (, ,)
  (NP (NNS officials))
  (VP (VBD said)
    (NP-TMP (NN today)))
  (. .)))

I now want to splitt the Tree dependent to its structure to get the clauses. So in this example i want to splitt the tree to get the following parts:

  • The strongest rain ever recorded in India
  • The strongest rain shut down the financial hub of Mumbai
  • The strongest rain snapped communication lines
  • The strongest rain closed airports
  • The strongest rain forced thousands of people to sleep in their offices
  • The strongest rain forced thousands of people to walk home during night

How can i do that?


So the first answer was to use an recursive algorithm to print all root to leaf pathes.

Here is the code i tried:

public static void main(String[] args) throws IOException {
    LexicalizedParser lexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");

    Tree tree = lexicalizedParser.parse("In a ceremony that was conspicuously short on pomp and circumstance at a time of austerity, Felipe, 46, took over from his father, King Juan Carlos, 76.");

    printAllRootToLeafPaths(tree, new ArrayList<String>());
}

private static void printAllRootToLeafPaths(Tree tree, List<String> path) {
    if(tree != null) {
        if(tree.isLeaf()) {
            path.add(tree.nodeString());
        }

        if(tree.children().length == 0) {
            System.out.println(path);
        } else {
            for(Tree child : tree.children()) {
                printAllRootToLeafPaths(child, path);
            }
        }

        path.remove(tree.nodeString());
    }
}

Ofcourse this code is totally unlogical because if i just add the leafs to the paths there will never be the recursive call cause leafs have no children. The problem here is, all real words are leafs and so this algorithm will just print out single words which are leafs:

[The]
[strongest]
[rain]
[ever]
[recorded]
[in]
[India]
[shut]
[down]
[the]
[financial]
[hub]
[of]
[Mumbai]
[,]
[snapped]
[communication]
[lines]
[,]
[closed]
[airports]
[and]
[forced]
[thousands]
[of]
[people]
[to]
[sleep]
[in]
[their]
[offices]
[or]
[walk]
[home]
[during]
[the]
[night]
[,]
[officials]
[said]
[today]
[.]
Mulgard
  • 9,877
  • 34
  • 129
  • 232

1 Answers1

0

Take a look at print all root to leaf paths in a binary tree or splitting a binary tree:

Community
  • 1
  • 1
nefo_x
  • 3,050
  • 4
  • 27
  • 40
  • 1
    The Stanford edu.stanford.nlp.trees.Tree is not a binary tree but ofcourse you can still use such an recursive algorithm to print your tree with all possible comibinations from root to leaf. The problem here is, that the nodes are not the words. The nodes are the tags and the words are always leafs. So you just get all the nodes as tags and just the last node of a path (the leaf) is a word. – Mulgard Jun 24 '14 at 11:01
  • So if you use this algorithm you end up with: The, The strongest, The strongest rain, ... – Mulgard Jun 24 '14 at 11:06
  • you can consider noun phrases and verb phrases as root nodes. Also you can split VP by CC or , tokens. – nefo_x Jun 24 '14 at 11:23
  • you are using `tree.children().length` to check if make new spin of recursion. What I suggest is to check for NP / VP and CC. Can you upload your sample code somewhere to https://gist.github.com/, so i'll be able to spin up env? – nefo_x Jun 24 '14 at 19:27
  • Do you mean this? [gist github](https://gist.github.com/Lybrial/6431ae8a552f63100f5c) – Mulgard Jun 25 '14 at 04:39