nltk PunktSentenceTokenizer: tokenize sentences without whitespace in between

Asked Jul 08 '22 at 10:05

Active Jul 08 '22 at 10:05

Viewed 65 times

Is it possible to make the NLTK PunktSentenceTokenizer to split sentences that do not have whitespace between each other?

from nltk.tokenize.punkt import PunktSentenceTokenizer

sent_tokenizer = PunktSentenceTokenizer()
print(sent_tokenizer.tokenize('Sky is blue.Metal is black.'))

'''
Output:

['Sky is blue.Metal is black.']
'''

asked Jul 08 '22 at 10:05

revy

3,945
7
40
85

nltk PunktSentenceTokenizer: tokenize sentences without whitespace in between

0 Answers0