4

I'm using this example for creating ROC plot from SVM classification results: http://scikit-learn.org/0.13/auto_examples/plot_roc.html

However, each data point effectively consists of 4 length-d feature vectors, combined using a custom kernel function that doesn't conform to the specific K(X, X) paradigm. As such, I have to supply a precomputed kernel to scikit-learn in order to do classification. It looks something like this:

K = numpy.zeros(shape = (n, n))

# w1 + w2 + w3 + w4 = 1.0

# v1: array, shape (n, d)
# w1: float in [0, 1)
chi = sklearn.metrics.pairwise.chi2_kernel(v1, v1)
mu = 1.0 / numpy.mean(chi)
K += w1 * numpy.exp(-mu * chi)

# v2: array, shape (n, d)
# w2: float in [0, 1)
chi = sklearn.metrics.pairwise.chi2_kernel(v2, v2)
mu = 1.0 / numpy.mean(chi)
K += w2 * numpy.exp(-mu * chi)

# v3: array, shape (n, d)
# w3: float in [0, 1)
chi = sklearn.metrics.pairwise.chi2_kernel(v3, v3)
mu = 1.0 / numpy.mean(chi)
K += w3 * numpy.exp(-mu * chi)

# v4: array, shape (n, d)
# w4: float in [0, 1)
chi = sklearn.metrics.pairwise.chi2_kernel(v4, v4)
mu = 1.0 / numpy.mean(chi)
K += w4 * numpy.exp(-mu * chi)

return K

The main obstacle to generating a ROC plot (from the above link) seems to be the process of splitting the data into two sets, and then calling predict_proba() on the test set. Is it possible to do this in scikit-learn using a precomputed kernel?

Magsol
  • 4,640
  • 11
  • 46
  • 68

1 Answers1

1

The short answer is "perhaps not". Did you try something like the below?

Based on example at http://scikit-learn.org/stable/modules/svm.html, you need something like:

    import numpy as np

    from sklearn import svm
    X = np.array([[0, 0], [1, 1]])
    y = [0, 1]
    clf = svm.SVC(kernel='precomputed')

    # kernel computation
    K = numpy.zeros(shape = (n, n))

    # "At the moment, the kernel values between all training vectors 
    #  and the test vectors must be provided." 
    #  according to scikit learn web page. 
    #  -- This is the problem!
    # v1: array, shape (n, d)
    # w1: float in [0, 1)
    chi = sklearn.metrics.pairwise.chi2_kernel(v1, v1)
    mu = 1.0 / numpy.mean(chi)
    K += w1 * numpy.exp(-mu * chi)

    # v2: array, shape (n, d)
    # w2: float in [0, 1)
    chi = sklearn.metrics.pairwise.chi2_kernel(v2, v2)
    mu = 1.0 / numpy.mean(chi)
    K += w2 * numpy.exp(-mu * chi)

    # v3: array, shape (n, d)
    # w3: float in [0, 1)
    chi = sklearn.metrics.pairwise.chi2_kernel(v3, v3)
    mu = 1.0 / numpy.mean(chi)
    K += w3 * numpy.exp(-mu * chi)

    # v4: array, shape (n, d)
    # w4: float in [0, 1)
    chi = sklearn.metrics.pairwise.chi2_kernel(v4, v4)
    mu = 1.0 / numpy.mean(chi)
    K += w4 * numpy.exp(-mu * chi)

    # scikit-learn is a wrapper LIBSVM and looking at the LIBSVM Readme file
    # it seems you need kernel values for test data something like this:    

    Kt = numpy.zeros(shape = (nt, n))
    # t1: array, shape (nt, d)
    # w1: float in [0, 1)
    chi = sklearn.metrics.pairwise.chi2_kernel(t1, v1)
    mu = 1.0 / numpy.mean(chi)
    Kt += w1 * numpy.exp(-mu * chi)

    # v2: array, shape (n, d)
    # w2: float in [0, 1)
    chi = sklearn.metrics.pairwise.chi2_kernel(t2, v2)
    mu = 1.0 / numpy.mean(chi)
    Kt += w2 * numpy.exp(-mu * chi)

    # v3: array, shape (n, d)
    # w3: float in [0, 1)
    chi = sklearn.metrics.pairwise.chi2_kernel(t3, v3)
    mu = 1.0 / numpy.mean(chi)
    Kt += w3 * numpy.exp(-mu * chi)

    # v4: array, shape (n, d)
    # w4: float in [0, 1)
    chi = sklearn.metrics.pairwise.chi2_kernel(t4, v4)
    mu = 1.0 / numpy.mean(chi)
    Kt += w4 * numpy.exp(-mu * chi)

    clf.fit(K, y) 

    # predict on testing examples
    probas_ = clf.predict_proba(Kt)

from here on just copy bottom of http://scikit-learn.org/0.13/auto_examples/plot_roc.html

Bull
  • 11,771
  • 9
  • 42
  • 53
  • Right, but the problem is you used `X_test`, which I can't create by virtue of each data point consisting of 4 distinct n-dimensional feature vectors that are combined in the kernel function. I can't split the data into training and testing, unless you're advocating creating two gram matrices, which scikit-learn actually forbids doing ("results will be unexpected"). – Magsol May 24 '13 at 09:14
  • So if I'm reading this correctly, it's perfectly acceptable to provide a gram matrix `Kt` to `predict_proba()` that is different from the gram matrix `K` used to train the SVM? (with the twist that the test vectors in `Kt` need to be compared against the training vectors) – Magsol May 24 '13 at 12:46
  • If they are calculated off the same kernel - one has to be K(train,train) and the other K(test,train). Some of the calculations above worry me though - e.g. Are all those mu's handled correctly. – Bull May 24 '13 at 13:08
  • I did not realize that! I just assumed--from the scikit-learn documentation as well--that you could only test/train on the same identical gram matrix. As for the calculations, the formula I'm using (from Nilsback et al 2008) is: K(i, j) = SUM_f{w_f * exp{-mu_f * chi^2{x_f(i), x_f(j)} } } . "f" ranges from 1-4, one for each feature set. Formally it's a Mercer kernel, or a sum of Mercer kernels that are weighted (`w_f`) to sum to 1. `mu_f` is 1/mean value of all the chi^2 distances for that feature set. Did I miss something in my implementation? – Magsol May 24 '13 at 13:18