Regarding Feature Extraction for Sentiment Analysis

Asked Jul 29 '17 at 18:26

Active Jul 29 '17 at 18:26

Viewed 296 times

I'm writing a small sentiment analysis program in Python by training a Naive Bayes classifier with positive and negative examples of online reviews.

My problem concerns the feature extraction step - currently I'm using a bag of words to hold all of the features. I have a couple of functions that go over the list of words in the featureset and remove stopwords, as well as a stemmer and a lemmatizer. I can enable or disable these functions so as to see their effect on the final accuracy of the classifier.

I've never done anything in sentiment analysis before, so forgive me if its a basic question.

Do I run these functions only on the bag of words featureset, or do they need to be run on the text in the reviews as well? It seems that the accuracy measure either doesn't change or goes down when I run these functions over the featureset, so I thought maybe I needed to run it over the review text in the testing/training set as well.

asked Jul 29 '17 at 18:26

solum

You would be better off asking this question on https://stats.stackexchange.com/ – tupui Jul 29 '17 at 18:44
You should use "your functions" on the text of the review. Then create bag of words of new text of the review which will be the input for your classifier. – THe_strOX Jul 31 '17 at 07:49

Regarding Feature Extraction for Sentiment Analysis

0 Answers0