0

I want to calculate the ROC curve with rdkit implementation:

rdkit.ML.Scoring.Scoring.CalcAUC(scores, col)

Determines the area under the ROC curve

code:

import rdkit.ML.Scoring.Scoring

rdkit.ML.Scoring.Scoring.CalcAUC(scores, y)

and I get the following error:

IndexError: invalid index to scalar variable.

my data:

scores

array([32.336, 31.894, 31.74 , ..., -0.985, -1.629, -1.82 ])

y

array(['Inactive', 'Inactive', 'Inactive', ..., 'Inactive', 'Inactive','Inactive'], dtype=object)

I do not know what's wrong.

rnv86
  • 790
  • 4
  • 10
  • 22
  • Did you check the [documentation](https://www.rdkit.org/docs/source/rdkit.ML.Scoring.Scoring.html)? Admittedly it is not great but the input should be: ***scores***: ordered list with descending similarity containing active/inactive information and ***col***: column index in scores where active/inactive information is stored. so something like `CalcAUC([(0.8, 1), (0.4, 0)], 1)`. If I were you I would use the scikit-learn implementation if possible. – Oliver Scott Nov 26 '20 at 10:00
  • Thanks! Yes, I have been reading the rdkit documentation. But it has no examples – rnv86 Nov 27 '20 at 23:18

1 Answers1

1
from rdkit.ML.Scoring.Scoring import CalcAUC
scores = [32.336, 31.894, 31.74, 30., 20.]  # assume scores is sorted in descending order
y = ['Inactive', 'Inactive', 'Inactive', 'Active', 'Inactive']

label_map = {'Active': 1, 'Inactive': 0}
labels = [label_map[y_true] for y_true in y]
auc = CalcAUC(list(zip(scores, labels)), 1)
print('Area Under the ROC Curve:', auc)

As mentioned in the comment above. The documentation for CalcAUC and other metrics is here but is pretty minimal.

JoshuaBox
  • 735
  • 1
  • 4
  • 16