I ran this code in the past and it worked fine. A couple of months later, it keeps causing the kernel to die.
I reinstalled and updated all conda/python related files. It doesn't seem to matter. It stalls out on the last line, and no error message is printed out.
It worked once, and failed 7 of last 8 times.
corpus = df['reviewText']
import nltk
import re
nltk.download('stopwords')
wpt = nltk.WordPunctTokenizer()
stop_words = nltk.corpus.stopwords.words('english')
def normalize_document(doc):
# lower case and remove special characters\whitespaces
doc = re.sub(r'[^a-zA-Z\s]', '', doc, re.I|re.A)
doc = doc.lower()
doc = doc.strip()
# tokenize document
tokens = wpt.tokenize(doc)
# filter stopwords out of document
filtered_tokens = [token for token in tokens if token not in stop_words]
# re-create document from filtered tokens
doc = ' '.join(filtered_tokens)
return doc
normalize_corpus = np.vectorize(normalize_document)
norm_corpus = normalize_corpus(corpus)
Happy to hear any suggestions or ideas. If there is some way to display an error, or reason for the kernel dying, please let me know.