I'm trying to create a full pipeline of results for sentiment analysis for a smaller subset of the IMDB reviews (only 2k pos, 2k neg) so I'm tryna show results at each stage
i.e. without any pre-processing, then basic cleaning (remove specials, stopwords, lowercasing) then testing both stemming and lemmitization (seperately) on top of the basic cleaning.
After basic cleaning I'm jumping from 50% (only binary classification so makes sense) to mid-to-low 80%'s. Then after adding stemming and lemming, it either doesn't change or for random forest gets the recall below 80%.
Why's this the case? Are my results normal? If so how do you justify using either one?
Also to note all of the models and feature extractions are using default parameters from sklearn so I haven't gotten to the model optimization part, should I try that for these 3 cases and then see if they perform worse?
Feature Extractions: Bag of Words and TF-Idf
Models: SVM, Logistic Regression, Multinomial Naive Bayes and Random Forest
Results:
Basic Cleaning (remove specials, stopwords, lowercasing)
SVM BOW
precision recall f1-score support
Positive 0.85 0.85 0.85 530
Negative 0.83 0.83 0.83 470
accuracy 0.84 1000
macro avg 0.84 0.84 0.84 1000
weighted avg 0.84 0.84 0.84 1000
SVM TF-IDF
precision recall f1-score support
Positive 0.85 0.88 0.86 530
Negative 0.86 0.83 0.84 470
accuracy 0.85 1000
macro avg 0.86 0.85 0.85 1000
weighted avg 0.86 0.85 0.85 1000
LR BOW
precision recall f1-score support
Positive 0.87 0.85 0.86 530
Negative 0.83 0.85 0.84 470
accuracy 0.85 1000
macro avg 0.85 0.85 0.85 1000
weighted avg 0.85 0.85 0.85 1000
LR TF-IDF
precision recall f1-score support
Positive 0.89 0.82 0.85 530
Negative 0.81 0.88 0.84 470
accuracy 0.85 1000
macro avg 0.85 0.85 0.85 1000
weighted avg 0.85 0.85 0.85 1000
MNB BOW
precision recall f1-score support
Positive 0.83 0.85 0.84 530
Negative 0.82 0.81 0.82 470
accuracy 0.83 1000
macro avg 0.83 0.83 0.83 1000
weighted avg 0.83 0.83 0.83 1000
MNB TF-IDF
precision recall f1-score support
Positive 0.86 0.84 0.85 530
Negative 0.82 0.85 0.83 470
accuracy 0.84 1000
macro avg 0.84 0.84 0.84 1000
weighted avg 0.84 0.84 0.84 1000
RFC BOW
precision recall f1-score support
Positive 0.85 0.80 0.82 530
Negative 0.79 0.84 0.81 470
accuracy 0.82 1000
macro avg 0.82 0.82 0.82 1000
weighted avg 0.82 0.82 0.82 1000
RFC TF-IDF
precision recall f1-score support
Positive 0.84 0.81 0.83 530
Negative 0.80 0.83 0.81 470
accuracy 0.82 1000
macro avg 0.82 0.82 0.82 1000
weighted avg 0.82 0.82 0.82 1000
Basic Cleaning + Stemming
SVM BOW
precision recall f1-score support
Positive 0.85 0.82 0.83 530
Negative 0.80 0.83 0.82 470
accuracy 0.82 1000
macro avg 0.82 0.82 0.82 1000
weighted avg 0.82 0.82 0.82 1000
SVM TF-IDF
precision recall f1-score support
Positive 0.85 0.85 0.85 530
Negative 0.83 0.83 0.83 470
accuracy 0.84 1000
macro avg 0.84 0.84 0.84 1000
weighted avg 0.84 0.84 0.84 1000
LR BOW
precision recall f1-score support
Positive 0.85 0.83 0.84 530
Negative 0.81 0.84 0.83 470
accuracy 0.83 1000
macro avg 0.83 0.83 0.83 1000
weighted avg 0.83 0.83 0.83 1000
LR TF-IDF
precision recall f1-score support
Positive 0.89 0.81 0.85 530
Negative 0.80 0.88 0.84 470
accuracy 0.84 1000
macro avg 0.84 0.85 0.84 1000
weighted avg 0.85 0.84 0.84 1000
MNB BOW
precision recall f1-score support
Positive 0.83 0.84 0.84 530
Negative 0.82 0.81 0.82 470
accuracy 0.83 1000
macro avg 0.83 0.83 0.83 1000
weighted avg 0.83 0.83 0.83 1000
MNB TF-IDF
precision recall f1-score support
Positive 0.87 0.83 0.85 530
Negative 0.82 0.86 0.84 470
accuracy 0.84 1000
macro avg 0.84 0.84 0.84 1000
weighted avg 0.84 0.84 0.84 1000
RFC BOW
precision recall f1-score support
Positive 0.84 0.77 0.80 530
Negative 0.76 0.83 0.79 470
accuracy 0.80 1000
macro avg 0.80 0.80 0.80 1000
weighted avg 0.80 0.80 0.80 1000
RFC TF-IDF
precision recall f1-score support
Positive 0.83 0.79 0.81 530
Negative 0.78 0.81 0.80 470
accuracy 0.80 1000
macro avg 0.80 0.80 0.80 1000
weighted avg 0.80 0.80 0.80 1000
Basic Cleaning + Lemmitization
SVM BOW
precision recall f1-score support
Positive 0.84 0.83 0.83 530
Negative 0.81 0.82 0.82 470
accuracy 0.83 1000
macro avg 0.83 0.83 0.83 1000
weighted avg 0.83 0.83 0.83 1000
SVM TF-IDF
precision recall f1-score support
Positive 0.85 0.86 0.86 530
Negative 0.84 0.83 0.84 470
accuracy 0.85 1000
macro avg 0.85 0.85 0.85 1000
weighted avg 0.85 0.85 0.85 1000
LR BOW
precision recall f1-score support
Positive 0.86 0.84 0.85 530
Negative 0.82 0.84 0.83 470
accuracy 0.84 1000
macro avg 0.84 0.84 0.84 1000
weighted avg 0.84 0.84 0.84 1000
LR TF-IDF
precision recall f1-score support
Positive 0.88 0.81 0.84 530
Negative 0.80 0.87 0.84 470
accuracy 0.84 1000
macro avg 0.84 0.84 0.84 1000
weighted avg 0.84 0.84 0.84 1000
MNB BOW
precision recall f1-score support
Positive 0.82 0.85 0.83 530
Negative 0.82 0.80 0.81 470
accuracy 0.82 1000
macro avg 0.82 0.82 0.82 1000
weighted avg 0.82 0.82 0.82 1000
MNB TF-IDF
precision recall f1-score support
Positive 0.85 0.83 0.84 530
Negative 0.81 0.84 0.82 470
accuracy 0.83 1000
macro avg 0.83 0.83 0.83 1000
weighted avg 0.83 0.83 0.83 1000
RFC BOW
precision recall f1-score support
Positive 0.84 0.78 0.81 530
Negative 0.77 0.83 0.80 470
accuracy 0.80 1000
macro avg 0.80 0.81 0.80 1000
weighted avg 0.81 0.80 0.80 1000
RFC TF-IDF
precision recall f1-score support
Positive 0.84 0.81 0.82 530
Negative 0.80 0.82 0.81 470
accuracy 0.82 1000
macro avg 0.82 0.82 0.82 1000
weighted avg 0.82 0.82 0.82 1000