4

I have difficulty in plotting OneClassSVM's AUC plot in python (I am using sklearn which generates confusion matrix like [[tp, fp],[fn,tn]] with fn=tn=0.

from sklearn.metrics import roc_curve, auc
fpr, tpr, thresholds = roc_curve(y_test, y_nb_predicted)
roc_auc = auc(fpr, tpr) # this generates ValueError[1]
print "Area under the ROC curve : %f" % roc_auc
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)

I want to handle error [1] and plot AUC for OneClassSVM.

[1] ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
imkhan
  • 171
  • 4
  • 16

2 Answers2

4

Please see my answer on a similar question. The gist is:

  • OneClassSVM fundamentally doesn't support converting a decision into a probability score, so you cannot pass the necessary scores into functions that require varying a score threshold, such as for ROC or Precision-Recall curves and scores.

  • You can approximate this type of score by computing the max value of your OneClassSVM's decision function across your input data, call it MAX, and then score the prediction for a given observation y by computing y_score = MAX - decision_function(y).

  • Use these scores to pass as y_score to functions such as average_precision_score, etc., which will accept non-thresholded scores instead of probabilities.

  • Finally, keep in mind that ROC will make less physical sense for OneClassSVM specifically because OneClassSVM is intended for situations where there is an expected and huge class imbalance (outliers vs. non-outliers), and ROC will not accurately up-weight the relative success on the small amount of outliers.

ely
  • 74,674
  • 34
  • 147
  • 228
  • To calculate AUROC, can you confirm why `MAX` is needed? Negating `decision_score` should be sufficient in my estimation since adding an offset to all values should not affect AUROC. If you think otherwise, please correct me. – ZaydH Feb 28 '19 at 12:11
  • 2
    @ZaydH `MAX` is not required. There are many transformations that could work, and just negation would be fine too. However, the transformation using `MAX` is a popular and somewhat standard approach to this because it rescales the outlier score to the positive x-axis (lower score means bigger outlier), and this can be very helpful for plotting these "scores" and doing other operations of them, to know you can easily compare with zero. – ely Feb 28 '19 at 14:19
  • @ely would you have a paper reference that uses the `MAX` transformation please ? the only paper I know uses the negation (`scores = (-1.0) * self.svm.decision_function(self.K_test)` here: https://github.com/lukasruff/Deep-SVDD/blob/master/src/svm.py ) – Blupon Feb 03 '20 at 09:14
  • @ely Sorry but I don't see why we cannot have a roc carve based on the score_samples. Could you elaborate on that? https://github.com/scikit-learn/scikit-learn/blob/95d4f0841/sklearn/svm/_classes.py#L1279 – partizanos May 03 '20 at 23:17
-1

Use the predprobs function to calculate the scores or probabilities/scores as asked in the auc(y_true, y_score), the issue is because of y_score. you can convert it as shown in the following line of code

# Classifier - Algorithm - SVM
# fit the training dataset on the classifier
SVM = svm.SVC(C=1.0, kernel='linear', degree=3, gamma='auto',probability=True)
SVM.fit(Train_X_Tfidf,Train_Y)
# predict the labels on validation dataset
predictions_SVM = SVM.predict(Test_X_Tfidf)
# Use accuracy_score function to get the accuracy
print("SVM Accuracy Score -> ",accuracy_score(predictions_SVM, Test_Y))

probs = SVM.predict_proba(Test_X_Tfidf)
preds = probs[:,1]
fpr, tpr, threshold = roc_curve(Test_Y, preds)
print("SVM Area under curve -> ",auc(fpr, tpr))

see the difference between the accuracy_score and the auc(), you need the scores of predictions.

share edit delete flag

Shaina Raza
  • 1,474
  • 17
  • 12