1

I'm using a multi-class classifier, so in order to evaluate it after testing, I need the predictions from the classifier (y_pred)to be compared against the true class values (y_test).

But I have them both as 1D arrays, like so:

y_test = [1, 1, 1, 2, 1, 4, 5, 3, ... etc ]
y_pred = [1, 1, 1, 2, 3, 2, 5, 0, ... etc ]

In total I have 46 classes.

But in order to build ROC curves (as in here: http://scikit-learn.org/stable/auto_examples/plot_roc.html), I'm guessing I need the y_test and y_pred to be in a 2D matrix with binary values, of the following shape: number_of_test_cases x number_of_classes.

Where each column represents one class, and 1 represents the fact that the classifier recognized this class on the given test sample row.

So given the above few values I showed, I understand I need y_test to look something like this:

y_test = [ 1 0 ... 
           1 0
           1 0
           0 1 
           1 0   
           0 1
           0 0
           0 0
           ...

This is what I understand... I hope I'm right!

Is there any numpy function to create such a matrix from a 1D array?

user961627
  • 12,379
  • 42
  • 136
  • 210

1 Answers1

1

Have a look at the label_binarize function that's referenced in the example code in your link.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94
  • Thanks - this solved the problem of 1D to 2D matrix conversion. However, I got another error: http://stackoverflow.com/questions/25133718/making-roc-curve-using-python-for-multiclassification – user961627 Aug 05 '14 at 07:52