3

I am working on a project involving anaphora resolution via Hobbs algorithm. I have parsed my text using the Stanford parser, and now I would like to manipulate the nodes in order to implement my algorithm.

At the moment, I don't understand how to:

  • Access a node based on its POS tag (e.g. I need to start with a pronoun - how do I get all pronouns?).

  • Use visitors. I'm a bit of a noob of Java, but in C++ I needed to implement a Visitor functor and then work on its hooks. I could not find much for the Stanford Parser's Tree structure though. Is that jgrapht? If it is, could you provide me with some pointers at code snippets?

Tex
  • 950
  • 1
  • 10
  • 22

2 Answers2

10

@dhg's answer works fine, but here are two other options that it might also be useful to know about:

  • The Tree class implements Iterable. You can iterate through all the nodes of a Tree, or, strictly, the subtrees headed by each node, in a pre-order traversal, with:

    for (Tree subtree : t) { 
        if (subtree.label().value().equals("PRP")) {
            pronouns.add(subtree);
        }
    }
    
  • You can also get just nodes that satisfy some (potentially quite complex pattern) by using tregex, which behaves rather like java.util.regex by allowing pattern matches over trees. You would have something like:

    TregexPattern tgrepPattern = TregexPattern.compile("PRP");
    TregexMatcher m = tgrepPattern.matcher(t);
    while (m.find()) {
        Tree subtree = m.getMatch();
        pronouns.add(subtree);
    }
    
Christopher Manning
  • 9,360
  • 34
  • 46
  • Thank you! The .children() method is enough for me to write a BFS iterator on it, but for future reference, why is I jgrapht's not compatible? I read you used that library to build your tree structure so I thought it should be! Last thing I need is to get the word labeled by the POS tag - say I have t = (PRP he). With t.label().value() I can get PRP, how do I get "he" ? Thank you again! =) – Tex May 09 '12 at 12:21
  • The Tree class doesn't use jgrapht. (While it has been retrofitted in various ways, such as implementing Iterable, the Tree class was originally written against JDK 1.1, before jgrapht existed, and in some ways shows its age....) We did previously use jgrapht behind our dependency graphs, but have recently moved away from it. Our impression was that many people preferred less library dependencies. To get the word, you head one level down `subtree.firstChild().label().value()`. – Christopher Manning May 09 '12 at 16:52
5

Here's a simple example that parses a sentence and finds all of the pronouns.

private static ArrayList<Tree> findPro(Tree t) {
    ArrayList<Tree> pronouns = new ArrayList<Tree>();
    if (t.label().value().equals("PRP"))
        pronouns.add(t);
    else
        for (Tree child : t.children())
            pronouns.addAll(findPro(child));
    return pronouns;
}

public static void main(String[] args) {

    LexicalizedParser parser = LexicalizedParser.loadModel();
    Tree x = parser.apply("The dog walks and he barks .");
    System.out.println(x);
    ArrayList<Tree> pronouns = findPro(x);
    System.out.println("All Pronouns: " + pronouns);

}

This prints:

    (ROOT (S (S (NP (DT The) (NN dog)) (VP (VBZ walks))) (CC and) (S (NP (PRP he)) (VP (VBZ barks))) (. .)))
    All Pronouns: [(PRP he)]
dhg
  • 52,383
  • 8
  • 123
  • 144
  • Thank you! Can you give me some more information about visiting this tree? For example, is t.children() a BFS visit? – Tex May 07 '12 at 08:33
  • 1
    @Davide: `children` returns just the children, so you need to build a DFS or BFS traversal yourself. See the [API docs](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/Tree.html). – Fred Foo May 07 '12 at 08:42
  • One more question: I get the POS tag with t.label().value(). How do I get the actual word? i.e. say I have t = (PRP he). With t.label().value() I can get PRP, how do I get "he" ? – Tex May 07 '12 at 13:25
  • @Tex you can get by getting t.children().get(i).value() ... "he" – swapyonubuntu Jul 05 '15 at 09:14