Well, I am making a sentiment analysis classifier and I have three classes/labels, positive, neutral and negative. The Shape of my training data is (14640, 15), where
negative 9178
neutral 3099
positive 2363
I have pre-processed the data to make it standardized and applied the bag-of-words word vectorization technique to the text of twitter for making it feedable to the model, whose size is then (14640, 1000). As the Y, means the label is in the text form so, I applied LabelEncoder so that I can make it in a single line. Like this -
[1 2 1 ... 1 0 1]
This is how I split my dataset -
X_train, X_test, Y_train, Y_test = train_test_split(bow, Y, test_size=0.3, stratify=Y, random_state=42)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)
out:(10248, 1000) (10248,)
(4392, 1000) (4392,)
stratify=y
will make the imbalanced data into a proper weighted form. For the classifier part, I have used SVM -
svc = svm.SVC(kernel='linear', C=1, probability=True, class_weight='balanced').fit(X_train, Y_train)
prediction = svc.predict_proba(X_test)
prediction_int = prediction[:,1] >= 0.3
prediction_int = prediction_int.astype(np.int)
print(prediction_int)
print('Precision score: ', precision_score(Y_test, prediction_int, average=None))
print('Accuracy Score: ', accuracy_score(Y_test, prediction_int))
out:[0 0 0 ... 1 0 0]
Precision score: [0.74185137 0.50075529 0. ]
Accuracy Score: 0.6691712204007286
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/classification.py:1437: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)
@desertnaut helped me a lot to decide, what is the actual problem, lastly, I saw that the classifier is unable to predict the third class. You can see that I have printed out prediction_int
and it is not showing any 2
index. Also, it is nowhere near actual labels. I am worried if there is any mistake, happened during classification. This classifier, I made for my binary classification, and I think I do not need to change it for multi-class classification. Can any of you help me to solve this?