I am trying to use Naive Bayes for spam-ham classification.
training_set['E_Mail'] = training_set['E_Mail'].str.split()
vocabulary = []
for email in training_set['E_Mail']:
for word in email:
vocabulary.append(tuple(word))
vocabulary = list(set(vocabulary))
word_counts_per_email = {unique_word: [0] * len(training_set['E_Mail']) for unique_word in vocabulary}
for index, email in enumerate(training_set['E_Mail']):
for word in email:
word_counts_per_email[word][index] += 1
I am getting a word error repeteadly on here:
word_counts_per_email = {unique_word: [0] * len(training_set['E_Mail']) for unique_word in vocabulary}
for index, email in enumerate(training_set['E_Mail']):
for word in email:
word_counts_per_email[word][index] += 1
The error message is just this:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-30-1706354aaff0> in <module>()
3 for index, email in enumerate(training_set['E_Mail']):
4 for word in email:
----> 5 word_counts_per_email[word][index] += 1
KeyError: 'hafta'
'hafta' is the first word of the pandas dataframe and the trainng dataset.
I tried the solution on this issue that seemed similar to mine but it didn't work out.
I will appreciate any hint to get this over, thank you.