I am confused. The sklearn implementation of a kernel SVM does not arrive at the same optimum as manually solving the problem does. In fact, the two solutions are completely different. What is going on?
Let me provide an example. Below, we generate data distributed along two concentric circles. There is some small Gaussian noise added to it. Then, we label one circle as +1 and the other one as -1. The goal is to classify correctly.
### GENERATE DATA
import numpy as np
import matplotlib.pyplot as plt
r1 = 0.1
r2 = 1.0
n_points = 150
t = np.linspace(0, 2*np.pi, n_points)
x1 = r1 * np.cos(t)
y1 = r1 * np.sin(t)
t = np.linspace(0, 2*np.pi, n_points)
x2 = r2 * np.cos(t)
y2 = r2 * np.sin(t)
X1 = np.vstack((x1,y1)).T
X2 = np.vstack((x2,y2)).T
X = np.vstack((X1,X2))
y = np.concatenate((np.ones(n_points),-np.ones(n_points)))
mean = np.array([0, 0])
covariance = np.diag([0.01, 0.01])
noise = np.random.multivariate_normal(mean, covariance, 2*n_points)
X = X + noise
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='viridis')
Now let's try sklearn. We manually build the kernel matrix because we will need it later anyway.
from sklearn.metrics.pairwise import pairwise_kernels
from sklearn.svm import SVC
def rbf(x,y,gamma):
z = np.exp(-gamma*np.sum((x-y)**2))
return(z)
n_features = X.shape[1]
Kmat = pairwise_kernels(X, metric=rbf, gamma=1/n_features)
svm = SVC(kernel='precomputed', C=1)
fit = svm.fit(Kmat, y)
v = np.zeros(2*n_points)
v[fit.support_] = fit.dual_coef_
### Remember that sklearn returns y_i alpha_i and not alpha_i
alpha = v/y
Now, we compare with CVXOPT. The results are vastly different.
import cvxopt as opt
from cvxopt import matrix as cvxopt_matrix
from cvxopt import solvers as cvxopt_solvers
def cvx_svm(Kmat, y, C):
n_samples = len(y)
P = cvxopt_matrix(np.outer(y,y) * Kmat)
q = cvxopt_matrix(np.ones(n_samples) * -1)
A = cvxopt_matrix(y, (1,n_samples))
b = cvxopt_matrix(0.0)
tmp1 = np.diag(np.ones(n_samples) * -1)
tmp2 = np.identity(n_samples)
G = cvxopt_matrix(np.vstack((tmp1, tmp2)))
tmp1 = np.zeros(n_samples)
tmp2 = np.ones(n_samples) * C
h = cvxopt_matrix(np.hstack((tmp1, tmp2)))
sol = cvxopt_solvers.qp(P, q, G, h, A, b)
opt = np.array(sol['x'])
return(opt)
alpha_manual = cvx_svm(Kmat, 1.0*y, C=1)
np.linalg.norm(alpha_manual - alpha)
The final output is far from zero!!!