NLTK - Chunk grammar doesn't read commas

Question

from nltk.chunk.util import tagstr2tree
from nltk import word_tokenize, pos_tag
text = "John Rose Center is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas ,Nike ,Reebok Center."
tagged_text = pos_tag(text.split())

grammar = "NP:{<NNP>+}"

cp = nltk.RegexpParser(grammar)
result = cp.parse(tagged_text)

print(result)

Output:

(S
  (NP John/NNP Rose/NNP Center/NNP)
  is/VBZ
  very/RB
  beautiful/JJ
  place/NN
  and/CC
  i/NN
  want/VBP
  to/TO
  go/VB
  there/RB
  with/IN
  (NP Barbara/NNP Palvin./NNP)
  Also/RB
  there/EX
  are/VBP
  stores/NNS
  like/IN
  (NP Adidas/NNP ,Nike/NNP ,Reebok/NNP Center./NNP))

The grammar i use for chunking only works on nnp tags but if words are sequential with commas they will still on the same line.I want my chunk like this:

(S
  (NP John/NNP Rose/NNP Center/NNP)
  is/VBZ
  very/RB
  beautiful/JJ
  place/NN
  and/CC
  i/NN
  want/VBP
  to/TO
  go/VB
  there/RB
  with/IN
  (NP Barbara/NNP Palvin./NNP)
  Also/RB
  there/EX
  are/VBP
  stores/NNS
  like/IN
  (NP Adidas,/NNP)
  (NP Nike,/NNP)
  (NP Reebok/NNP Center./NNP))

What should i write in the "grammar=" or can i edit the output like i wrote above?As you can see i only parse proper nouns for my named entity project pls help me out.

Use `tagged_text = pos_tag(word_tokenize(text))`. Try not to use the github repo's issue tracker to get attention to the SO question. The issue tracker should be used to report bug, suggest enhancement. — alvas, Apr 21 '16 at 10:19
Also, it's not related to Java/StanfordNLP, so I've removed those tags. — alvas, Apr 21 '16 at 10:23
Please also avoid posting multiple questions on the same topic and varying the questions in minimally incremental manner. — alvas, Apr 21 '16 at 10:31
i feel sorry what you wrote is right and what i did is not ethical. Closed the issue and link your answer to the another question — Arda Nalbant, Apr 21 '16 at 10:40
It's alright, just don't make the same mistake again =) FYI, if you raise/close an issue on the github repo, ALL followers of the repo and developers will receive an email about the issue by default and it's pretty hard to "erase" it from history because github repos tracks almost everything. — alvas, Apr 21 '16 at 10:49
i understand. both new stack and github. it wont happen again. Btw thanks your answer helped me a lot. — Arda Nalbant, Apr 21 '16 at 10:56

score 2 · Accepted Answer · answered Apr 21 '16 at 10:23

Use word_tokenize(string) instead of string.split():

>>> import nltk
>>> from nltk.chunk.util import tagstr2tree
>>> from nltk import word_tokenize, pos_tag
>>> text = "John Rose Center is very beautiful place and i want to go there with Barbara Palvin. Also there are stores like Adidas ,Nike ,Reebok Center."
>>> tagged_text = pos_tag(word_tokenize(text))
>>> 
>>> grammar = "NP:{<NNP>+}"
>>> 
>>> cp = nltk.RegexpParser(grammar)
>>> result = cp.parse(tagged_text)
>>> 
>>> print(result)
(S
  (NP John/NNP Rose/NNP Center/NNP)
  is/VBZ
  very/RB
  beautiful/JJ
  place/NN
  and/CC
  i/NN
  want/VBP
  to/TO
  go/VB
  there/RB
  with/IN
  (NP Barbara/NNP Palvin/NNP)
  ./.
  Also/RB
  there/EX
  are/VBP
  stores/NNS
  like/IN
  (NP Adidas/NNP)
  ,/,
  (NP Nike/NNP)
  ,/,
  (NP Reebok/NNP Center/NNP)
  ./.)

Thanks! but how can i rewrite like this is there a function for rewriting the chunked text ? http://stackoverflow.com/questions/36702150/python-re-write-the-text-with-its-proper-nouns-chunked?stw=2 — Arda Nalbant, Apr 21 '16 at 10:59

NLTK - Chunk grammar doesn't read commas

1 Answers1

Linked

Related