Drawing a flatten NLTK Parse Tree with NP chunks

Question

I want to analyze sentences with NLTK and display their chunks as a tree. NLTK offers the method tree.draw() to draw a tree. This following code draws a tree for the sentence "the little yellow dog barked at the cat":

import nltk 
sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"), ("dog", "NN"), ("barked","VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")]

pattern = "NP: {<DT>?<JJ>*<NN>}"
NPChunker = nltk.RegexpParser(pattern) 
result = NPChunker.parse(sentence)
result.draw()

The result is this tree:

How do i get a tree with one more level like this?

@alvas, why did you change the title to "Flatten"? I wouldn't say the OPs question is about flattening – the branching of the trees is the same in both examples. Rather, the OP is asking for a separate level for the PoS tags (btw: not only for the non-NP words...). — lenz, Aug 11 '15 at 19:28
@lenz, because most probably that's what he'll need. Because the NP pattern he was using is what people do with term extraction, noun/entities extraction, etc. And the `.draw()` is purely presentation, so it won't change much of the parse results =) Just to double check, @raxer, is that what you're asking for? — alvas, Aug 11 '15 at 21:13
@alvas, my aim is to have another presentation. Take for example the word yellow. It has the pos tag `JJ`. In the first picture the `JJ` is on the same layer as yellow. But on the second picture `JJ` has a layer above the word yellow. How can i display it like on the second picture? — raxer, Aug 17 '15 at 07:26

score 4 · Answer 1 · answered Aug 11 '15 at 08:51

You need to "level up" your non-NP words, here's a hack:

import nltk 
sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"), ("dog", "NN"), ("barked","VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")]

pattern = """NP: {<DT>?<JJ>*<NN>}
VBD: {<VBD>}
IN: {<IN>}"""
NPChunker = nltk.RegexpParser(pattern) 
result = NPChunker.parse(sentence)
result.draw()

[out]:

score 0 · Answer 2 · answered May 28 '18 at 07:26

I know it's little too late to answer. But here is how I've done it. The idea is you need to convert your sentence into a tree.

import nltk
sentence = list(map(lambda sent: Tree(sent[1], children=[sent[0]]), sentence))

Then you can do the chunking afterwards.

NPChunker = nltk.RegexpParser(pattern) 
result = NPChunker.parse(sentence)
result.draw()

Here's my result Tree

Drawing a flatten NLTK Parse Tree with NP chunks

2 Answers2

Linked