roc_curve: indices must be integers, not tuple

Question

I am trying to plot ROC and thus calculate false positive and negative rate using the sklearn.metrics.roc_curve function.

roc_data = *somedataframeimport*

X_train, X_test, y_train, y_test = split_vect_trans(roc_data)

After vectorizing and transforming my data with my own function here, I fit a NN with it and make predictions.

nn_roc = OneVsRestClassifier(MLPClassifier())
nn_roc = nn_roc.fit(X_train, y_train)

pred = nn_roc.predict(X_test)


fpr = dict()
tpr = dict()
roc_auc = dict()

for i in range(len(y_score)):
    fpr[i], tpr[i], _ = metrics.roc_curve(y_test[:, i], y_score[:, i])
    roc_auc[i] = metrics.auc(fpr[i], tpr[i])

Always get the following error message though, when passing the data into the roc_curve function.

          5 for i in range(len(y_score)):
    ----> 6     fpr[i], tpr[i], _ = metrics.roc_curve(y_test[:, i], y_score[:, i])
          7     roc_auc[i] = metrics.auc(fpr[i], tpr[i])
TypeError: list indices must be integers, not tuple

I tried adding a line that explicitly converts the input data to an array (suggestion I read on another post for the same error message). This now gives IndexError: too many indices for array as an error message.

y_test_array = np.asarray(y_test)
y_score = np.asarray(pred)

for i in range(len(y_score)):
    fpr[i], tpr[i], _ = metrics.roc_curve(y_test[:, i], y_score[:, i])
    roc_auc[i] = metrics.auc(fpr[i], tpr[i])

No, it's a list that looks like this: [1, 1, 1, 1, 1, 1, 1, ..., 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1] That's why I tried adding the np.asarray() part. — Lyyoness, Jun 24 '17 at 13:57
But my question is: you changed y_test in y_test_array, but in the code you are still using y_test isn't it? Because the error suggest me that y_test is a list and not a numpy array. Could you please give me the result from y_test.shape — Fra93, Jun 24 '17 at 14:00
I missed that, yes. Changing that such that the input is the y_test_array with shape (705,) gives: IndexError: too many indices for array for the same row. Edited the question accordingly. — Lyyoness, Jun 24 '17 at 14:07
Ok now we are figuring out the problem. You are transferring a list into an array. Then you are trying to use two indexes to access a flat array (like it was a matrix), hence the problem. What we have to understand now is: what was y_test before the "asarray()" call? — Fra93, Jun 24 '17 at 14:10
A list with length 705 like [0, 1, 1, ... 1, 1, 1]. Afterwards it's an array with dimensions (705, ) that looks like this: [1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1...] — Lyyoness, Jun 24 '17 at 14:13
Ok so, do you understand what addressing an array with arr[ : , x ] means? It means that "arr" has **two** dimensions and you are taking **all** the axis zero and just the element "i" of the axis one. But here you have only an axis, so when you try to access with two indexes, it says that there are too many of them. Did you understand know? — Fra93, Jun 24 '17 at 14:16

score 0 · Answer 1 · answered Jun 24 '17 at 14:29

An array is just a sequence of element. You can access the element i addressing it with its position in the array:

| a | b | c | ...... |z|

"b" element 1 (starts from zero)

When you instead have two dimensional arrays (aka a matrix), then you need two indexes to uniquely identify an element:

| aa | ab | .... | az |  
| ba | bb | ....   .  
| ca | ...         .  
  .                .  
  .                .  

| za | zb | .... | zz |

"ca" is the element [2,0]

In NumPy you can address an entire row or column by means of " : " so mat[:,0] means "all the rows of the column zero (aka the first column)

Or you can access a range of elements. mat[2:5,0] means "the elements from 2 to 4(included) of the first row of the matrix"

Coming to your problem, you are trying to access with two indexes a flat array. Probably the coma is wrong, since you want to access the element from 0 to "i".

roc_curve: indices must be integers, not tuple

1 Answers1