2

I have a histogram generated by matplotlib and i have been using sklearn metrics to calculate the precision-recall curve. This is a plot showing the positive predictive value (PPV) of a histogram in dependency of the recall. This is the histogram: enter image description here

The generated curve takes the following form:

enter image description here

I thought that the negative predictive value (NPV) is the inverse of the PPV so my guess was to simply do NPV = 1 - PPV but that didnt work out pretty much. So far I have been using the functions from the metrics library from the sklearn module to generate ROC curves and the precision-recall curve. But I havent found any specific curve in metrics so far which can do such thing like negative predictive value. This is the source code i have been using to generate curves from the histogram:


import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import pylab
from sklearn import metrics

data1 = np.loadtxt('1.txt') 
data2 = np.loadtxt('2.txt') 
x = np.transpose(data1)[1]
y = np.transpose(data2)[1]

background =  (1 + y)/2
signal =  (1 + x)/2

classifier_output = np.concatenate([background,signal])
true_value = np.concatenate([np.zeros_like(background, dtype=int), np.ones_like(signal, dtype=int)])

precision, recall, threshold = metrics.precision_recall_curve(true_value, classifier_output)
plt.plot(recall, precision)
plt.show()

Is there any other way in metrics or in general to calculate the NPV of this a histogram like this one?

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
sussko
  • 101
  • 1
  • 1
  • 4

1 Answers1

1

Although it's hard from your plots to tell what precision and recall are for you, we can easily modify your code to compute what you are asking.

From wiki the Precision (a.k.a Positive Predictive Value or PPV) is the number of true positives divided by the number of samples predicted as positive (i.e. true positives + false positives), while the Negative Predictive Value or NPV is the number of true negatives divided by the number of samples predicted as negative. Thus, we can compute the NPV by swapping positive with negative. In code:

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import pylab
from sklearn import metrics

data1 = np.loadtxt('1.txt') 
data2 = np.loadtxt('2.txt') 
x = np.transpose(data1)[1]
y = np.transpose(data2)[1]

background =  (1 + y)/2
signal =  (1 + x)/2

classifier_output = np.concatenate([background,signal])
true_value = np.concatenate([np.zeros_like(background, dtype=int), np.ones_like(signal, dtype=int)])

precision, recall, threshold = metrics.precision_recall_curve(true_value, classifier_output)
npv, fnr, inv_thresh = metrics.precision_recall_curve(1 - true_value, 1 - classifier_output)
plt.plot(recall, precision)
plt.plot(recall, npv[::-1])
plt.show()

Notice that we need to reverse the npv in order to match the order of precision. This is due to the fact that metrics.precision_recall_curve sorts the outputs by threshold, i.e. by input scores. Because we used once classifier_output and once 1 - classifier_output as input, the order is reversed. If you want to check this, try to plot precision and npv w.r.t. threshold and inv_thresh.

DISCLAIMER: I could not try the code I am providing, so some more refinement may be needed

Corrado
  • 46
  • 3