18

I'd like to implement my own Gaussian kernel in Python, just for exercise. I'm using: sklearn.svm.SVC(kernel=my_kernel) but I really don't understand what is going on.

I expect the function my_kernel to be called with the columns of the X matrix as parameters, instead I got it called with X, X as arguments. Looking at the examples things are not clearer.

What am I missing?

This is my code:

'''
Created on 15 Nov 2014

@author: Luigi
'''
import scipy.io
import numpy as np
from sklearn import svm
import matplotlib.pyplot as plt

def svm_class(fileName):

    data = scipy.io.loadmat(fileName)
    X = data['X']
    y = data['y']

    f = svm.SVC(kernel = 'rbf', gamma=50, C=1.0)
    f.fit(X,y.flatten())
    plotData(np.hstack((X,y)), X, f)

    return

def plotData(arr, X, f):

    ax = plt.subplot(111)

    ax.scatter(arr[arr[:,2]==0][:,0], arr[arr[:,2]==0][:,1], c='r', marker='o', label='Zero')
    ax.scatter(arr[arr[:,2]==1][:,0], arr[arr[:,2]==1][:,1], c='g', marker='+', label='One')

    h = .02  # step size in the mesh
    # create a mesh to plot in
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))


    # Plot the decision boundary. For that, we will assign a color to each
    # point in the mesh [x_min, m_max]x[y_min, y_max].
    Z = f.predict(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.contour(xx, yy, Z)



    plt.xlim(np.min(arr[:,0]), np.max(arr[:,0]))
    plt.ylim(np.min(arr[:,1]), np.max(arr[:,1]))
    plt.show()
    return


def gaussian_kernel(x1,x2):
    sigma = 0.5
    return np.exp(-np.sum((x1-x2)**2)/(2*sigma**2))

if __name__ == '__main__':

    fileName = 'ex6data2.mat'
    svm_class(fileName)
Archie
  • 2,247
  • 1
  • 18
  • 35
Luigi Tiburzi
  • 4,265
  • 7
  • 32
  • 43

2 Answers2

22

After reading the answer above, and some other questions and sites (1, 2, 3, 4, 5), I put this together for a gaussian kernel in svm.SVC().

Call svm.SVC() with kernel=precomputed.

Then compute a Gram Matrix a.k.a. Kernel Matrix (often abbreviated as K).

Then use this Gram Matrix as the first argument (i.e. X) to svm.SVC().fit():

I start with the following code:

C=0.1
model = svmTrain(X, y, C, "gaussian")

that calls sklearn.svm.SVC() in svmTrain(), and then sklearn.svm.SVC().fit():

from sklearn import svm

if kernelFunction == "gaussian":
    clf = svm.SVC(C = C, kernel="precomputed")
    return clf.fit(gaussianKernelGramMatrix(X,X), y)

the Gram Matrix computation - used as a parameter to sklearn.svm.SVC().fit() - is done in gaussianKernelGramMatrix():

import numpy as np

def gaussianKernelGramMatrix(X1, X2, K_function=gaussianKernel):
    """(Pre)calculates Gram Matrix K"""

    gram_matrix = np.zeros((X1.shape[0], X2.shape[0]))
    for i, x1 in enumerate(X1):
        for j, x2 in enumerate(X2):
            gram_matrix[i, j] = K_function(x1, x2)
    return gram_matrix

which uses gaussianKernel() to get a radial basis function kernel between x1 and x2 (a measure of similarity based on a gaussian distribution centered on x1 with sigma=0.1):

def gaussianKernel(x1, x2, sigma=0.1):

    # Ensure that x1 and x2 are column vectors
    x1 = x1.flatten()
    x2 = x2.flatten()

    sim = np.exp(- np.sum( np.power((x1 - x2),2) ) / float( 2*(sigma**2) ) )

    return sim

Then, once the model is trained with this custom kernel, we predict with "the [custom] kernel between the test data and the training data":

predictions = model.predict( gaussianKernelGramMatrix(Xval, X) )

In short, to use a custom SVM gaussian kernel, you can use this snippet:

import numpy as np
from sklearn import svm

def gaussianKernelGramMatrixFull(X1, X2, sigma=0.1):
    """(Pre)calculates Gram Matrix K"""

    gram_matrix = np.zeros((X1.shape[0], X2.shape[0]))
    for i, x1 in enumerate(X1):
        for j, x2 in enumerate(X2):
            x1 = x1.flatten()
            x2 = x2.flatten()
            gram_matrix[i, j] = np.exp(- np.sum( np.power((x1 - x2),2) ) / float( 2*(sigma**2) ) )
    return gram_matrix

X=...
y=...
Xval=...

C=0.1
clf = svm.SVC(C = C, kernel="precomputed")
model = clf.fit( gaussianKernelGramMatrixFull(X,X), y )

p = model.predict( gaussianKernelGramMatrixFull(Xval, X) )
arturomp
  • 28,790
  • 10
  • 43
  • 72
  • What is your "Xval" in this case? Is that the training set to run predictions on? Also, since the inputs to "gaussianKernelGramMatrixFull" have to be the same dimensions, do you manually have to resize X to generate the Gram correctly? – Vinit Nayak Sep 24 '18 at 15:49
  • @VinitNayak not quite - `Xval`s are values whose predictions we want to get. We trained on the training set `x` with the known labels `y`, but we don't know what the labels for `Xval` are, so we predict them. – arturomp Sep 24 '18 at 22:25
  • @VinitNayak `X` and `Xval` don't need to be the same size. they just have to have the same number of columns (the `n` in an `m` by `n` matrix), as you would expect your training data and your data-to-predict to have (i.e. they shouldn't have the same number of samples but they should have the same number of features) – arturomp Sep 24 '18 at 22:42
  • My `X` is a 622x9 matrix, `y` is 622x1 matrix, `Xval` (which is my cross validation set) is 266x9. When I run the above, I get the following error when predicting: `svm/base.py", line 455, in _validate_for_predict (X.shape[1], self.shape_fit_[0])) ValueError: X.shape[1] = 9 should be equal to 622, the number of samples at training time` – Vinit Nayak Sep 27 '18 at 04:46
  • you have 622 samples w 9 features in your X? in that case, you have 266 test data points also with 9 features. you probably just need to transpose the array. You can do this in numpy as `my_array.T` – arturomp Oct 05 '18 at 18:39
10

For efficiency reasons, SVC assumes that your kernel is a function accepting two matrices of samples, X and Y (it will use two identical ones only during training) and you should return a matrix G where:

G_ij = K(X_i, Y_j)

and K is your "point-level" kernel function.

So either implement a gaussian kernel that works in such a generic way, or add a "proxy" function like:

def proxy_kernel(X,Y,K):
    gram_matrix = np.zeros((X.shape[0], Y.shape[0]))
    for i, x in enumerate(X):
        for j, y in enumerate(Y):
            gram_matrix[i, j] = K(x, y)
    return gram_matrix

and use it like:

from functools import partial
correct_gaussian_kernel = partial(proxy_kernel, K=gaussian_kernel)
BartoszKP
  • 34,786
  • 15
  • 102
  • 130
lejlot
  • 64,777
  • 8
  • 131
  • 164
  • Excuse me but I don't understand, probably because I am a very beginner in ML. Shouldn't be enough returning a square matrix with n=n_rows(X), the elements being all the combinations between X[:,0] and X[:,1]? – Luigi Tiburzi Nov 17 '14 at 10:38
  • As I said, it won't be always called with two X's as arguments, so no. Furthermore your data is multidimensional, so X[:,0] is just first dimension of each vector which has nothing to do here. – lejlot Nov 17 '14 at 23:09
  • And what should I return? – Luigi Tiburzi Nov 19 '14 at 17:31
  • A Gram matrix, exactly as it is written in the answer. matrix where i'th row and j'th column consists kernel value between i'th vector from first array and j'th from the second one, so kernel(X,Y)_{ij} = K(X_i, Y_j) – lejlot Nov 19 '14 at 18:11
  • @lejlot my question might be directly related because I am using self-computed kernel. Can you please have a look https://stackoverflow.com/questions/47564504/in-self-compute-kernelx-raise-valueerrorx-shape0-should-be-equal-to-x-sha – Mona Jalal Nov 30 '17 at 03:32