4

I am trying to recreate a SVM object in R from a PMML file, but am having trouble understanding how R stores the alpha coefficients. I am currently testing it on the iris data set and I generated a R SVM object with the command

library(e1071)
data(iris)
model<-svm(Species~.,data=iris)

and I am looking at its coefficients with the command

model$coefs

to get the following result

            [,1]        [,2]
 [1,]  0.0890967  0.00000000
 [2,]  0.0000000  0.14547777
 [3,]  0.8651998  0.94869969
 [4,]  0.0000000  0.13152589
 [5,]  0.0000000  0.27612243
 [6,]  0.8421469  0.45912899
 [7,]  0.4785865  0.00000000
 [8,]  1.0000000  1.00000000
 [9,] -0.4941407  1.00000000
[10,]  0.0000000  1.00000000
[11,]  0.0000000  0.63848160
[12,]  0.0000000  1.00000000
[13,]  0.0000000  1.00000000
[14,] -0.5471576  0.00000000
[15,]  0.0000000  0.52796849
[16,] -0.3772321  0.49504241
[17,]  0.0000000  1.00000000
[18,]  0.0000000  1.00000000
[19,] -0.1146136  1.00000000
[20,]  0.0000000  1.00000000
[21,]  0.0000000  1.00000000
[22,]  0.0000000  1.00000000
[23,]  0.0000000  1.00000000
[24,]  0.0000000  1.00000000
[25,]  0.0000000  1.00000000
[26,]  0.0000000  1.00000000
[27,] -0.7418858  0.10024212
[28,]  0.0000000  1.00000000
[29,]  0.0000000  0.60104219
[30,] -1.0000000  0.00000000
[31,] -0.8335805 -1.00000000
[32,]  0.0000000 -0.05538514
[33,]  0.0000000 -1.00000000
[34,]  0.0000000 -1.00000000
[35,] -0.6171002  0.00000000
[36,] -0.3564736 -1.00000000
[37,]  0.0000000 -1.00000000
[38,]  0.0000000 -1.00000000
[39,]  0.0000000 -1.00000000
[40,]  0.0000000 -1.00000000
[41,]  0.0000000 -1.00000000
[42,]  0.0000000 -1.00000000
[43,] -0.6609450 -0.78275762
[44,]  0.0000000 -1.00000000
[45,]  0.0000000 -1.00000000
[46,]  0.0000000 -1.00000000
[47,]  0.0000000 -1.00000000
[48,]  0.0000000 -0.52463404
[49,]  0.0000000 -1.00000000
[50,] -0.4928554  0.00000000
[51,]  0.0000000 -1.00000000

To my understanding, there are 51 support vectors and since R uses one versus one for multi-class SVM, there are essentially 3 classifiers (setosa v. versicolor, setosa v. virginica, and versicolor v. virginica) that each use a subset of these vectors. How do I know which coefficients in this coefs list correspond to which classifier (and which support vectors are used by each classifier)?

I saw that model$nSV tells you how many support vectors are in each classifier, but it does not specify which support vectors are actually part of the classifier. Thanks in advance.

1 Answers1

1

Yes, the way that libsvm (which r uses) keeps the support vector is a bit "cryptic". To understand better, let's use only Petal features, so we can visualize it later.

library(e1071)
data(iris)

fit=svm(Species~Petal.Length+Petal.Width, data=iris, kernel = "linear", cost = 10, scale=F)

The "alphas time ys" are stored in the coef matrix. To know how many SV relate to each class, you have to look at:

n = fit$nSV; n

In my run there are 1, 8 and 8. This means that the first n[1] (1) SV relate only to the first class. For the first n[1] rows the columns are 1vs2, 1vs3. For the next n[2] rows the columns are 2vs1, 2vs3. Etc. Note that some values might be 0. In my run, 7/8 values in class 2- column 1 are 0, since you only need 1 point to separate class 1 and 2.

If we want to extract the 1 vs. 3 separation planes, we need to do it as follows:

# class 1 vs. 3
# class 1 has n[1] SV, class 3 has n[3]
# rows of n[1], column 2 = [1vs2, 1vs3*]
# rows of n[3], column 1 = [3vs1*, 3vs2]
coef1 = c(fit$coefs[1:n[1],2],fit$coefs[(sum(n[1:2])+1):sum(n),1])
SVs1 = rbind(fit$SV[1:n[1],],fit$SV[(sum(n[1:2])+1):sum(n),])
w1 = t(SVs1)%*%coef1
# rho stores the b's, [1vs2, 1vs3, 2vs3]
b1 = -fit$rho[2]

Plotting (only class 1 and 3):

iris.norm = iris[, c('Petal.Length', 'Petal.Width')]
plot(rbind(iris.norm[1:50,],iris.norm[101:150,]), col=iris$Species)
abline(-b1/w1[2], -w1[1]/w1[2], col=4)

Image:

enter image description here

Maverick Meerkat
  • 5,737
  • 3
  • 47
  • 66