1

The following code should create a support vector classifier (SVM with linear kernel) using the ksvm function from the kernlab package:

library(kernlab)
set.seed(1)
x <- rbind(matrix(rnorm(10 * 2, mean = 0), ncol = 2),
           matrix(rnorm(10 * 2, mean = 2), ncol = 2))
y <- c(rep(-1, 10), rep(1, 10))
svc <- ksvm(x, y, type = "C-svc", kernel = "vanilladot")
plot(svc, data = x)

The resulting graph:

SVM plot

If my understanding is correct, the black shapes are the support vectors, which are the data points that lie inside or on the boundary of the margin.

So what's up with the topmost black dot? There are three open dots (so not support vectors) that are closer to the decision boundary. (Two are nearby and easy to see. The third is harder to see unless you zoom in on the picture, but it's the one furthest to the right.)

Either there is a bug in the implementation here or I'm missing something conceptual about the way this is supposed to work. Any insights?

Sean Raleigh
  • 579
  • 4
  • 10

1 Answers1

2

There's nothing wrong with your results. The 6 support vectors are indeed closest to your decision surface (i.e. line in your case). I admit that the shading in the plot you're showing looks a bit odd. Could this be an optical artefact?

Let's reproduce your results using svm from the e1071 library (since I'm more familiar with e1071 than with kernlab).

  1. Here is your sample data.

    # Sample data
    set.seed(1)
    x <- rbind(matrix(rnorm(10 * 2, mean = 0), ncol = 2),
               matrix(rnorm(10 * 2, mean = 2), ncol = 2))
    y <- c(rep(-1, 10), rep(1, 10))
    df <- data.frame(x = x, y = as.factor(y));
    
  2. Let's use svm as a classification machine using a linear kernel. scale = FALSE ensures that data are not scaled.

    library(e1071);
    fit <- svm(y ~ ., data = df, kernel = "linear", type = "C-classification", scale = FALSE);
    fit;
    #
    #Call:
    #svm(formula = y ~ ., data = df, kernel = "linear", type = "C-classification",
    #    scale = FALSE)
    #
    #
    #Parameters:
    #   SVM-Type:  C-classification
    # SVM-Kernel:  linear
    #       cost:  1
    #      gamma:  0.5
    #
    #Number of Support Vectors:  6
    
  3. We plot the decision surface and support vectors (SV).

    plot(fit, df);
    

    enter image description here

    The SVs are marked by the x symbols. You can clearly see how the SVs are located nearest to the separating decision line.

  4. We can also extract the parameters of the decision line (i.e. its normal vector), and manually plot decision line and data:

    # Normal vector and offset
    w <- t(fit$coefs) %*% fit$SV
    b <- fit$rho;
    
    # Generate data for the decision line
    x.1 <- seq(min(df[, 1]), max(df[, 1]), length.out = 20);
    x.2 <- (b - w[1] * x.1) / w[2];
    df.h <- data.frame(x.1 = x.1, x.2 = x.2);
    
    # Plot
    ggplot(df, aes(x.2, x.1)) +
        geom_point(aes(colour = y), size = 2) +
        geom_line(data = df.h, aes(x = x.2, y = x.1), linetype = 2)
    

enter image description here

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • So the bug appears to be in the shading. Comparing the line you extracted to the shading from `kernlab`, it's clear that `kernlab` is not graphing the correct line. (The decision line should be in the middle of the lightest pink strip and the other darker shaded bands should have boundaries parallel to the decision line.) Thanks for your verification. – Sean Raleigh Apr 25 '18 at 02:43
  • Along the way, you also answered another long-standing question I've had about the ability to extract the parameters of the decision line from the model. ISLR (James, Witten, Hastie, Tibshirani) claims, "Unfortunately, the `svm()` function does not explicitly output the coefficients of the linear decision boundary obtained when the support vector classifier is fit, nor does it output the width of the margin." (pg. 331) So although that's somewhat true, it's nice to know that the information is available in the output, albeit indirectly. – Sean Raleigh Apr 25 '18 at 02:44
  • @SeanRaleigh I suggest sending an email to the `kernlab` [package maintainer](https://cran.r-project.org/web/packages/kernlab/index.html) with a link to this post. If this is a bug it should be fixed. – Maurits Evers Apr 25 '18 at 05:25