Python NLTK : Extract lexical head item from Stanford dependency parsed result

Question

I have a sentence and i want to extract lexical head item, i could do the dependency parsing using Stanford NLP library.

How can i extract main head head in a sentence?

In the case of the sentence Download and share this tool, the head would be Download.

I've tried the following:

 def get_head_word(text):
     standepparse=StanfordDependencyParser(path_to_jar='/home/stanford_resource/stanford-parser-full-2014-06-16/stanford-parser.jar',path_to_models_jar='/home/stanford_resource/stanford-parser-full-2014-06-16/stanford-parser-3.4-models.jar',model_path='/home/stanford_resource/stanford-parser-full-2014-06-16/stanford-parser-3.4-models/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz')
     parsetree=standepparse.raw_parse(text)
     p_tree=list(parsetree)[0]
     print p_tree.to_dot()

 text = 'Download and share this tool'
 get_head_word(text)


output:

digraph G{
edge [dir=forward]
node [shape=plaintext]

0 [label="0 (None)"]
0 -> 1 [label="root"]
1 [label="1 (Download)"]
1 -> 2 [label="cc"]
1 -> 3 [label="conj"]
1 -> 5 [label="dobj"]
2 [label="2 (and)"]
3 [label="3 (share)"]
4 [label="4 (this)"]
5 [label="5 (software)"]
5 -> 4 [label="det"]
}

Is it enough : `for n in p_tree.nodes.values(): if n['head']==0: print n['word'],n['head']` — aman, Jan 04 '16 at 08:54
Could you rephrase your question a little? What do you mean by "download"? — alvas, Jan 04 '16 at 09:46
@alvas I want to extract lexical head item. In sentence mentioned, (if I am correct) the head item is 'download'. So for another sentence `we love python`, the lexical head item would be `love`. — aman, Jan 04 '16 at 10:45
Oh, then you will only need to find the first node after "None->root" which will give you the sentence head. `next(n for n in p_tree.node_values() if n['head'] == 1)` — alvas, Jan 04 '16 at 11:29

score 1 · Accepted Answer · answered Jan 04 '16 at 11:39

To find the dependency head of sentence, simply look for nodes that whose head values points to the root node. In NLTK API to DependencyGraph, you can easily look for the node that its head points to the 1st index of the dictionary.

Do note that in dependency parsing unlike typical chomsky normal form / CFG parse trees there might be more than one head to the dependency parse.

But since you're casting the dependency output into a Tree structure, you can do the following:

tree_head = next(n for n in p_tree.node_values() if n['head'] == 1)

But do note that linguistically, the head in the sentenceDownload and share this tool should be Download and share. But computationally a tree is hierarchical and a normal-form tree would have ROOT->Download->and->share but some parsers might produce this tree too: ROOT->and->Download;share

Python NLTK : Extract lexical head item from Stanford dependency parsed result

1 Answers1