sklearn.metrics.roc_curve only shows 5 fprs, tprs, thresholds

Question

i have length 520 of array and metrics.roc_curve shows only a few fpr,tpr,threshold

these are some values of my score array

[... 4.6719894  5.3444934  2.575739   3.5660675  3.4357991  4.195427
4.120169   5.021058   5.308503   5.3124313  4.8253884  4.7469654
5.0011086  5.170149   4.5555115  4.4109273  4.6183085  4.356304
4.413242   4.1186514  5.0573816  4.646429   5.063631   4.363433
5.431669   6.1605806  6.1510544  4.8733225  6.0209446  6.5198536
5.1457767  1.3887328  1.3165888  1.143339   1.717379   1.6670974
1.1816382  1.2497046  1.035109   1.4904765  1.195155   1.2590547
1.0998954  1.6484532  1.5722921  1.2841778  1.1058662  1.3368237
1.3262213  1.215088   1.4224783  1.046008   1.262415   1.2319984
1.2202312  1.1610713  1.2327379  1.1951761  1.8699458  0.98760885
1.6670336  1.5051543  1.2339936  1.5215651  1.534271   1.1805111
1.1587876  1.0894692  1.1936147  1.3278677  1.2409594  1.0499009... ]

And i got only these results

fpr [0.         0.         0.         0.00204499 0.00204499 1.        ] 
tpr [0.         0.03225806 0.96774194 0.96774194 1.         1.        ] 
threshold [7.5198536 6.5198536 3.4357991 2.5991373 2.575739  0.8769072]

what is the reason of this ?

Welcome to SO! Can you clarify what the issue is? What were you expecting to see as far as results, and what is wrong with the results you got? — camelBack, Nov 07 '21 at 16:56

amiola · Accepted Answer · 2021-11-07T14:07:07.447

This might depend on the default value of the parameter drop_intermediate (default to true) of roc_curve(), which is meant for dropping suboptimal thresholds, doc here. You might prevent such behaviour by passing drop_intermediate=False, instead.

Here's an example:

import numpy as np
try:
    from sklearn.datasets import fetch_openml
    mnist = fetch_openml('mnist_784', version=1, cache=True)   
    mnist["target"] = mnist["target"].astype(np.int8)
except ImportError:
    from sklearn.datasets import fetch_mldata 
    mnist = fetch_mldata('MNIST original')

from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import cross_val_predict

X, y = mnist["data"], mnist["target"]
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
shuffle_index = np.random.permutation(60000)
X_train, y_train = X_train[shuffle_index], y_train[shuffle_index]

y_train_5 = (y_train == 5)
y_test_5 = (y_test == 5)

sdg_clf = SGDClassifier(random_state=42, verbose=0)
sdg_clf.fit(X_train, y_train_5)

y_scores = cross_val_predict(sdg_clf, X_train, y_train_5, cv=3, method='decision_function')

# ROC Curves

from sklearn.metrics import roc_curve

fpr, tpr, thresholds = roc_curve(y_train_5, y_scores)

len(thresholds), len(fpr), len(tpr)
# (3472, 3472, 3472)

# for roc curves, differently than for precision/recall curves, the length of thresholds and the other outputs do depend on drop_intermediate option, meant for dropping suboptimal thresholds

fpr_, tpr_, thrs = roc_curve(y_train_5, y_scores, drop_intermediate=False)
len(fpr_), len(tpr_), len(thrs)
# (60001, 60001, 60001)

sklearn.metrics.roc_curve only shows 5 fprs, tprs, thresholds

1 Answers1

Linked