I'm writing a small sentiment analysis program in Python by training a Naive Bayes classifier with positive and negative examples of online reviews.
My problem concerns the feature extraction step - currently I'm using a bag of words to hold all of the features. I have a couple of functions that go over the list of words in the featureset and remove stopwords, as well as a stemmer and a lemmatizer. I can enable or disable these functions so as to see their effect on the final accuracy of the classifier.
I've never done anything in sentiment analysis before, so forgive me if its a basic question.
Do I run these functions only on the bag of words featureset, or do they need to be run on the text in the reviews as well? It seems that the accuracy measure either doesn't change or goes down when I run these functions over the featureset, so I thought maybe I needed to run it over the review text in the testing/training set as well.