3

I want to analyze sentences with NLTK and display their chunks as a tree. NLTK offers the method tree.draw() to draw a tree. This following code draws a tree for the sentence "the little yellow dog barked at the cat":

import nltk 
sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"), ("dog", "NN"), ("barked","VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")]

pattern = "NP: {<DT>?<JJ>*<NN>}"
NPChunker = nltk.RegexpParser(pattern) 
result = NPChunker.parse(sentence)
result.draw()

The result is this tree:

example tree

How do i get a tree with one more level like this?

deeper tree

alvas
  • 115,346
  • 109
  • 446
  • 738
raxer
  • 63
  • 3
  • 5
  • @alvas, why did you change the title to "Flatten"? I wouldn't say the OPs question is about flattening – the branching of the trees is the same in both examples. Rather, the OP is asking for a separate level for the PoS tags (btw: not only for the non-NP words...). – lenz Aug 11 '15 at 19:28
  • @lenz, because most probably that's what he'll need. Because the NP pattern he was using is what people do with term extraction, noun/entities extraction, etc. And the `.draw()` is purely presentation, so it won't change much of the parse results =) Just to double check, @raxer, is that what you're asking for? – alvas Aug 11 '15 at 21:13
  • @alvas, my aim is to have another presentation. Take for example the word yellow. It has the pos tag `JJ`. In the first picture the `JJ` is on the same layer as yellow. But on the second picture `JJ` has a layer above the word yellow. How can i display it like on the second picture? – raxer Aug 17 '15 at 07:26

2 Answers2

4

You need to "level up" your non-NP words, here's a hack:

import nltk 
sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"), ("dog", "NN"), ("barked","VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")]

pattern = """NP: {<DT>?<JJ>*<NN>}
VBD: {<VBD>}
IN: {<IN>}"""
NPChunker = nltk.RegexpParser(pattern) 
result = NPChunker.parse(sentence)
result.draw()

[out]:

enter image description here


alvas
  • 115,346
  • 109
  • 446
  • 738
0

I know it's little too late to answer. But here is how I've done it. The idea is you need to convert your sentence into a tree.

import nltk
sentence = list(map(lambda sent: Tree(sent[1], children=[sent[0]]), sentence))

Then you can do the chunking afterwards.

NPChunker = nltk.RegexpParser(pattern) 
result = NPChunker.parse(sentence)
result.draw()

Here's my result Tree