1

I am building the model, having 12 parameters and {0,1} labels using logistic regression in sklearn. I need to be very confident about label 0, I am ok if some '0' will be missclassified to 1. The purpose of this, that I would like to exclude the data from the processing if the data is classifies to 0.

How can I tune the parameters?

Gambit1614
  • 8,547
  • 1
  • 25
  • 51
Vitaliy
  • 137
  • 1
  • 10
  • Your statement : `I need to be very confident about label 0, I am ok if some '0' will be missclassified to 1.` is contradicting. First you are saying that you need to be pretty sure about label 0 and then you are saying it's okay to miscalssify it. – Gambit1614 Sep 21 '17 at 06:10
  • sorry for confision. In other words, i want to sure: if I got 0 for test data, then the probability is very high, near to 99%, but if I got 1, I am ok with lower probabiliity. does it makes sense? – Vitaliy Sep 21 '17 at 17:14

1 Answers1

2

You are basically looking for specificity, which is defined as the TN/(TN+FP), where TN is True Negative and FP is False Positive. You can read more about this in this blog post and more in detail here. To implement this you need to use make_scorer along with confusion_matrix metric in sklearn as follows :

from sklearn.metrics import confusion_matrix
from sklearn.metrics import make_scorer

def get_TN_rate(y_true,y_pred):
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    specificity = float(tn)/(float(tn)+float(fp))
    return specificity

tn_rate = make_scorer(get_TN_rate,greater_is_better=True)

Now you can use tn_rate as a scoring function to train your classifier.

Gambit1614
  • 8,547
  • 1
  • 25
  • 51