I want to segment the text when we encounter the punctuation mark in a sentence or paragraph. If I use comma(,) in my regex it is also chunking the individual nouns verbs or adjectives separated by comma. Suppose we have "dogs, cats, rats and other animals". Dogs becomes a separate chunk, which I do not want to happen. Is there anyway I can ignore that using regex or any other means in nltk where I can only get comma separated clause as a text segment
Code
from nltk import sent_tokenize
import re
text = "Peter Mattei's 'Love in the Time of Money' is a visually stunning film to watch. Mrs. Mattei offers us a vivid portrait about human relations. This is a movie that seems to be telling us what money, power and success do to people in the different situation we encounter.
text= re.sub("(?<=..Dr|.Mrs|..Mr|..Ms|Prof)[.]","<prd>", text)
txt = re.split(r'\.\s|;|:|\?|\'\s|"\s|!|\s\'|\s\"', text)
print(txt)