I'm using this example for creating ROC plot from SVM classification results: http://scikit-learn.org/0.13/auto_examples/plot_roc.html
However, each data point effectively consists of 4 length-d feature vectors, combined using a custom kernel function that doesn't conform to the specific K(X, X) paradigm. As such, I have to supply a precomputed kernel to scikit-learn in order to do classification. It looks something like this:
K = numpy.zeros(shape = (n, n))
# w1 + w2 + w3 + w4 = 1.0
# v1: array, shape (n, d)
# w1: float in [0, 1)
chi = sklearn.metrics.pairwise.chi2_kernel(v1, v1)
mu = 1.0 / numpy.mean(chi)
K += w1 * numpy.exp(-mu * chi)
# v2: array, shape (n, d)
# w2: float in [0, 1)
chi = sklearn.metrics.pairwise.chi2_kernel(v2, v2)
mu = 1.0 / numpy.mean(chi)
K += w2 * numpy.exp(-mu * chi)
# v3: array, shape (n, d)
# w3: float in [0, 1)
chi = sklearn.metrics.pairwise.chi2_kernel(v3, v3)
mu = 1.0 / numpy.mean(chi)
K += w3 * numpy.exp(-mu * chi)
# v4: array, shape (n, d)
# w4: float in [0, 1)
chi = sklearn.metrics.pairwise.chi2_kernel(v4, v4)
mu = 1.0 / numpy.mean(chi)
K += w4 * numpy.exp(-mu * chi)
return K
The main obstacle to generating a ROC plot (from the above link) seems to be the process of splitting the data into two sets, and then calling predict_proba()
on the test set. Is it possible to do this in scikit-learn using a precomputed kernel?