I have trained a Logistic Regression classifier to predict whether a review is positive or negative. Now, I want to append the predicted probabilities returned by the predict_proba
-function to my Pandas data frame containing the reviews. I tried doing something like:
test_data['prediction'] = sentiment_model.predict_proba(test_matrix)
Obviously, that doesn't work, since predict_proba
returns a 2D-numpy array. So, what is the most efficient way of doing this? I created test_matrix
with SciKit-Learn's CountVectorizer:
vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
train_matrix = vectorizer.fit_transform(train_data['review_clean'].values.astype('U'))
test_matrix = vectorizer.transform(test_data['review_clean'].values.astype('U'))
Sample data looks like:
| Review | Prediction |
| ------------------------------------------ | ------------------ |
| "Toy was great! Our six-year old loved it!"| 0.986 |