I working on sentiment analysis in imbalanced dataset.I have problem with naive svm classifer which gives better roc-auc score than svm+sampling.This is naive svm results(first sad brackets is roc-auc score,second is G-mean score and third is f1_measure)
and this is oversample+svm results:
also this is my code for svm :
clf=SVC(kernel='linear',C=1,probability=True)
clf.fit(tf_idf_train3, polarity_train)
probs = clf.predict_proba(tf_idf_test3)
preds = probs[:,1]
fpr, tpr, threshold = metrics.roc_curve(polarity_test, preds)
pred = clf.predict(tf_idf_test3)
roc_auc=roc_auc_score(polarity_test,preds,average='macro')
print(classification_report(polarity_test,pred))
print(confusion_matrix(polarity_test,pred))
gmean=geometric_mean_score(polarity_test,pred,average='macro')
f1=f1_score(polarity_test, pred, average='macro')
and this is my code for svm+oversample:
clf=SVC(kernel='linear',C=1,probability=True)
X_resample, y_resampled = ros.fit_resample(tf_idf_train3, polarity_train)
clf.fit(X_resample, y_resampled)
probs = clf.predict_proba(tf_idf_test3)
preds = probs[:,1]
fpr, tpr, threshold = metrics.roc_curve(polarity_test, preds)
pred = clf.predict(tf_idf_test3)
roc_auc=roc_auc_score(polarity_test,preds ,average='macro')
print(classification_report(polarity_test,pred))
print(confusion_matrix(polarity_test,pred))
gmean=geometric_mean_score(polarity_test,pred,average='macro')
f1=f1_score(polarity_test, pred, average='macro')