1

I have trained a classifier in Python and i want to find the true positives, true negatives, false positives, false negatives when i am doing a new classification. The thing is that every time, my true_labels consist of one value points because in the problem i am investigating, i have only one label and i want to see how good the classifier performs on new data in recognizing this label. e.g:

labels_true = [2, 2, 2, 2, ..., 2]
labels_predicted = [2, 2, 23, 2, 2, 2, 2, 21, ..., 2, 2, 2, 2]

Of course `len(labels_true)=len(labels_predicted). Since I have only one true label, how can i calculate the above metrics?

azal
  • 1,210
  • 6
  • 23
  • 43

1 Answers1

2

if your label_true consists of true values only you can find only true positives (TP) and false negatives (FN) as there are no false values that can be found (true negatives TN) or missed (false positives FP)

TP,TN, FP, FN apply to a binary classification problem. either you analyze the whole confusion matrix or you do a binning to get a binary problem

here is a solution with binning:

from collections import Counter

truth      = [1, 2, 1, 2, 1, 1, 1, 2, 1, 3, 4, 1]
prediction = [1, 1, 2, 1, 1, 2, 1, 2, 1, 4, 4, 3]

confusion_matrix = Counter()

#say class 1, 3 are true; all other classes are false
positives = [1, 3]

binary_truth = [x in positives for x in truth]
binary_prediction = [x in positives for x in prediction]
print binary_truth
print binary_prediction

for t, p in zip(binary_truth, binary_prediction):
    confusion_matrix[t,p] += 1

print "TP: {} TN: {} FP: {} FN: {}".format(confusion_matrix[True,True], confusion_matrix[False,False], confusion_matrix[False,True], confusion_matrix[True,False])

EDIT: here is a full blown confusion matrix

from collections import Counter

truth      = [1, 2, 1, 2, 1, 1, 1, 2, 1, 3, 4, 1]
prediction = [1, 1, 2, 1, 1, 2, 1, 2, 1, 4, 4, 3]

# make confusion matrix
confusion_matrix = Counter()
for t, p in zip(truth, prediction):
    confusion_matrix[t,p] += 1

# print confusion matrix
labels = set(truth + prediction)
print "t/p",
for p in sorted(labels):
    print p,
print
for t in sorted(labels):
    print t,
    for p in sorted(labels):
        print confusion_matrix[t,p],
    print
stefan
  • 3,681
  • 15
  • 25
  • thank you. How can i slightly modify the confusion_matrix index in order to be iterable for many data sets? – azal Mar 31 '15 at 12:24
  • @gelazari: I'm not sure if i understand your problem correctly. see my edit. – stefan Mar 31 '15 at 16:23