I try to have a dependency tree from CoNLL input using the NLTK DependencyGraph. What I understood is that this class provides a tree() method
that build tree structure for dependency without the relation
between head
and dependents
. The tree has no also the POS Tag. There is also a triple() method
that provide the head, the relation and the dependents with POS tag. With the triple method, it is hard for me to get the dependents when when a word is repeated in the sentence like the red car is behind the blue car
because the index of the word is not in the triples. Here we have 2 different nodes for the same word car
.
So how to get from CoNLL input a dependency tree with the head word, its tags, relation, children. It can also a similar data structure where this information (head word, its tags, relation, children) can be found for a given sentence.Any suggestion is welcome. Below is a code that can be used to start.
from nltk.parse import DependencyGraph
conll_data2 = """1 Cathy Cathy N N eigen|ev|neut 2 su _ _
2 zag zie V V trans|ovt|1of2of3|ev 0 ROOT _ _
3 hen hen Pron Pron per|3|mv|datofacc 2 obj1 _ _
4 wild wild Adj Adj attr|stell|onverv 5 mod _ _
5 zwaaien zwaai N N soort|mv|neut 2 vc _ _
6 . . Punc Punc punt 5 punct _ _
1 the _ DET DT _ 3 det _ _
2 blue _ ADJ JJ _ 3 amod _ _
3 car _ NOUN NN _ 4 nsubj _ _
4 is _ VERB VBZ _ 0 ROOT _ _
5 behind _ ADP IN _ 4 prep _ _
6 the _ DET DT _ 8 det _ _
7 red _ ADJ JJ _ 8 amod _ _
8 car _ NOUN NN _ 5 pobj _ _
1 Ze ze Pron Pron per|3|evofmv|nom 2 su _ _
2 had heb V V trans|ovt|1of2of3|ev 0 ROOT _ _
3 met met Prep Prep voor 8 mod _ _
4 haar haar Pron Pron bez|3|ev|neut|attr 5 det _ _
5 moeder moeder N N soort|ev|neut 3 obj1 _ _
6 kunnen kan V V hulp|ott|1of2of3|mv 2 vc _ _
7 gaan ga V V hulp|inf 6 vc _ _
8 winkelen winkel V V intrans|inf 11 cnj _ _
9 , , Punc Punc komma 8 punct _ _
10 zwemmen zwem V V intrans|inf 11 cnj _ _
11 of of Conj Conj neven 7 vc _ _
12 terrassen terras N N soort|mv|neut 11 cnj _ _
13 . . Punc Punc punt 12 punct _ _
"""
graphs = [DependencyGraph(entry)
for entry in conll_data2.split('\n\n') if entry]
for graph in graphs:
#find data structure here to get head word, its tag, relation, children.