2

We are training a 1-class svm using scikit-learn OneClassSVM, which is a wrapper around libsvm. When we run with verbose=True, it reports the number of bounded suppport vectors, nBSV = 106 in the output below.

>>> clf = svm.OneClassSVM(nu=0.75, kernel="linear", verbose=True, shrinking=True, tol=0.00001)
>>> clf.fit(x)
[LibSVM].*
optimization finished, #iter = 392
obj = 182.273953, rho = 1.831054
nSV = 260, nBSV = 106

Now if we evaluate on the training set we get 186 negatives, which is more than the 106 bounded support vectors above.

>>> y=clf.predict(x)
>>> np.bincount(y.astype(np.int64)+1)
array([186,   0,  98])

According to my understanding of SVMs, this should be impossible. As long as there is a nonzero margin, training errors should be a subset of the bounded support vectors, since the bounded support vectors are training instances that are on the wrong side of the margin while training set errors are instance on the wrong side of the learned separator, which lies inside the margin.

While the actual numbers vary, this observation seem robust wrt to the settings for this data set. I have even seen nBSV=0 with majority of training samples misclassified.

Can somebody explain how this could be happening?

Daniel Mahler
  • 7,653
  • 5
  • 51
  • 90
  • I think we need more details. 186+98 = 284 is less than the total number of SVs reported in your first part. Perhaps the Libsvm output isn't counting exactly what you think it is. – Raff.Edward Mar 29 '16 at 21:51
  • @Raff.Edward Total number of support vectors is given by `nSV=260`. That means there are 54 free support vectors and 24 non-support vectors. 260 is somewhat higher than expected since 284*.75 = 213. – Daniel Mahler Mar 29 '16 at 22:03

0 Answers0