0

I am confused. The sklearn implementation of a kernel SVM does not arrive at the same optimum as manually solving the problem does. In fact, the two solutions are completely different. What is going on?

Let me provide an example. Below, we generate data distributed along two concentric circles. There is some small Gaussian noise added to it. Then, we label one circle as +1 and the other one as -1. The goal is to classify correctly.

### GENERATE DATA

import numpy as np
import matplotlib.pyplot as plt

r1 = 0.1
r2 = 1.0
n_points = 150

t = np.linspace(0, 2*np.pi, n_points)
x1 = r1 * np.cos(t)
y1 = r1 * np.sin(t)

t = np.linspace(0, 2*np.pi, n_points)
x2 = r2 * np.cos(t)
y2 = r2 * np.sin(t)

X1 = np.vstack((x1,y1)).T
X2 = np.vstack((x2,y2)).T
X = np.vstack((X1,X2))
y = np.concatenate((np.ones(n_points),-np.ones(n_points)))

mean = np.array([0, 0])
covariance = np.diag([0.01, 0.01])
noise = np.random.multivariate_normal(mean, covariance, 2*n_points)

X = X + noise

plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='viridis')

Now let's try sklearn. We manually build the kernel matrix because we will need it later anyway.

from sklearn.metrics.pairwise import pairwise_kernels
from sklearn.svm import SVC

def rbf(x,y,gamma):
    z = np.exp(-gamma*np.sum((x-y)**2))
    return(z)

n_features = X.shape[1]

Kmat = pairwise_kernels(X, metric=rbf, gamma=1/n_features)

svm = SVC(kernel='precomputed', C=1)

fit = svm.fit(Kmat, y)

v = np.zeros(2*n_points)
v[fit.support_] = fit.dual_coef_

### Remember that sklearn returns y_i alpha_i and not alpha_i
alpha = v/y 

Now, we compare with CVXOPT. The results are vastly different.

import cvxopt as opt
from cvxopt import matrix as cvxopt_matrix
from cvxopt import solvers as cvxopt_solvers

def cvx_svm(Kmat, y, C):
    
    n_samples = len(y)
    P = cvxopt_matrix(np.outer(y,y) * Kmat)
    q = cvxopt_matrix(np.ones(n_samples) * -1)
    A = cvxopt_matrix(y, (1,n_samples))
    b = cvxopt_matrix(0.0)
    tmp1 = np.diag(np.ones(n_samples) * -1)
    tmp2 = np.identity(n_samples)
    G = cvxopt_matrix(np.vstack((tmp1, tmp2)))
    tmp1 = np.zeros(n_samples)
    tmp2 = np.ones(n_samples) * C
    h = cvxopt_matrix(np.hstack((tmp1, tmp2)))

    sol = cvxopt_solvers.qp(P, q, G, h, A, b)
    opt = np.array(sol['x'])
    
    return(opt)

alpha_manual = cvx_svm(Kmat, 1.0*y, C=1)

np.linalg.norm(alpha_manual - alpha)

The final output is far from zero!!!

Claudio Moneo
  • 489
  • 1
  • 4
  • 10
  • It's the kernel similarity between x and y – Claudio Moneo Jul 17 '23 at 18:16
  • explain what def rbf(x,y,gamma): z = np.exp(-gamma*np.sum((x-y)**2)) return(z) n_features = X.shape[1] Kmat = pairwise_kernels(X, metric=rbf, gamma=1/n_features) does – Golden Lion Jul 18 '23 at 14:03
  • We define the Gaussian rbf kernel, set a typical value for its parameter gamma as 1/n_features and then compute the kernel matrix Kmat over all of the data. I don't understand the reference to train_test_split. But in this example, we need the kernel matrix explicitly (usually we don't in sklearn). The expected output of the svm is indeed -1 and +1, since we are doing binary classification. Have also added a reference to kernel SVMs as mentioned in sklearn. I hope this helps! – Claudio Moneo Jul 18 '23 at 15:15
  • CVXOPT is used to extract features. Why do you thing the CVXOPT which extracts features would be the same as svm which classifies – Golden Lion Jul 27 '23 at 03:55
  • It solves the same optimization problem – Claudio Moneo Jul 27 '23 at 05:09
  • They are not. The cvxopt formulation is mathematically equivalent – Claudio Moneo Jul 28 '23 at 08:46

0 Answers0