I am just doing some research into NLP with Python and I have identified something strange.
On review of the following negative tweets:
neg_tweets = [('I do not like this car', 'negative'),
('This view is horrible', 'negative'),
('I feel tired this morning', 'negative'),
('I am not looking forward to the concert', 'negative'),<---
('He is my enemy', 'negative')]
And with some processing by removing stop words.
clean_data = []
stop_words = set(stopwords.words("english"))
for (words, sentiment) in pos_tweets + neg_tweets:
words_filtered = [e.lower() for e in words.split() if e not in stop_words]
clean_data.append((words_filtered, sentiment))
Part of the output is:
(['i', 'looking', 'forward', 'concert'], 'negative')
I'm struggling to understand why the stop words include 'not' which can affect the sentiment of a tweet.
My understanding is that stop words have no value in terms of sentiment.
So, My question is why is 'not' included in the stopwords list?