0

Here is my sample code for SVM classification.

    train <- read.csv("traindata.csv")
    test <- read.csv("testdata.csv")

    svm.fit=svm(as.factor(value)~ ., data=train, kernel="linear", method="class")

    svm.pred = predict(svm.fit,test,type="class")

The feature value in my example is a factor which gives two levels (either true or false). I wanted to plot a graph of my svm classifier and group them into two groups. One group those with a "true" and another group as false. How do we produce a 3D or 2D SVM plot? I tried with plot(svm.fit, train) but it doesn't seem to work out for me. There is this answer i found on SO but I am not clear with what t, x, y, z, w, and cl are in the answer.

Plotting data from an svm fit - hyperplane

i have about 50 features in my dataset which the last column is a factor. Any simple way of doing it or if any one could help me explain his answer.

Community
  • 1
  • 1
Mahsolid
  • 433
  • 4
  • 12
  • 28

1 Answers1

1

The short answer is: you cannot. Your data is 50 dimensional. You cannot plot 50 dimensions. The only thing you can do are some rough approximations, reductions and projections, but none of these can actually represent what is happening inside. In order to plot 2D/3D decision boundary your data has to be 2D/3D (2 or 3 features, which is exactly what is happening in the link provided - they only have 3 features so they can plot all of them). With 50 features you are left with statistical analysis, no actual visual inspection.

You can obviously take a look at some slices (select 3 features, or main components of PCA projections). If you are not familiar with underlying linear algebra you can simply use gmum.r package which does this for you. Simply train svm and plot it forcing "pca" visualization, like here: http://r.gmum.net/samples/svm.basic.html.

library(gmum.r)

# We will perform basic classification on breast cancer dataset 
# using LIBSVM with linear kernel
data(svm_breast_cancer_dataset)

# We can pass either formula or explicitly X and Y
svm <- SVM(X1 ~ ., svm.breastcancer.dataset, core="libsvm", kernel="linear", C=10)
## optimization finished, #iter = 8980
pred <- predict(svm, svm.breastcancer.dataset[,-1])

plot(svm, mode="pca")

which gives

SVM visualization

for more examples you can refer to project website http://r.gmum.net/

However this only shows points projetions and their classification - you cannot see the hyperplane, because it is highly dimensional object (in your case 49 dimensional) and in such projection this hyperplane would be ... whole screen. Exactly no pixel would be left "outside" (think about it in this terms - if you have 3D space and hyperplane inside, this will be 2D plane.. now if you try to plot it in 1D you will end up with the whole line "filled" with your hyperplane, because no matter where you place a line in 3D, projection of the 2D plane on this line will fill it up! The only other possibility is that the line is perpendicular and then projection is a single point; the same applies here - if you try to project 49 dimensional hyperplane onto 3D you will end up with the whole screen "black").

lejlot
  • 64,777
  • 8
  • 131
  • 164
  • can't i use "train" in place of `svm_breast_cancer_dataset`? – Mahsolid Mar 31 '16 at 19:33
  • 1
    Exactly. You will not be able to plot decision boundary in **any real life dataset**. This is possible only for toy datasets, which have up to 3 features. For real data - you can still visualize something (like described in the answer) - but not the actual hyperplane – lejlot Mar 31 '16 at 19:33
  • Go through basics of calling svm on your data, everything is described in the link provided. – lejlot Mar 31 '16 at 19:35
  • It is the same example given on the link as you can see it here: http://r.gmum.net/svm/ :'( – Mahsolid Mar 31 '16 at 19:41
  • and here is the example using just matrices: http://r.gmum.net/samples/svm.example.weights.html , just substitute x and y with your own train data (features for x, labels for y), remove "weights", and use the "pca" mode instead of "contour" in plotting – lejlot Mar 31 '16 at 19:43
  • I have this error. `Error in data.matrix(x) : number of items to replace is not a multiple of replacement length` – Mahsolid Mar 31 '16 at 21:53
  • Just call `svm <- SVM(as.factor(value)~ ., train, core="libsvm", kernel="linear")`. If you have further problems - ask separate question, comments section is not for debugging the code. – lejlot Mar 31 '16 at 22:11
  • No, it does not. C is cost hyperparameter of SVM. Again - comments are not the place for such questions, ask separate question. – lejlot Mar 31 '16 at 22:52