18

In R, what is the functionality of probability=TRUE in the svm function of the e1071 package?

model <- svm (Type ~ ., data, probability=TRUE, cost = 100, gamma = 1)
jbaums
  • 27,115
  • 5
  • 79
  • 119
A.M.
  • 1,757
  • 5
  • 22
  • 41
  • svm isn't in base R - please mention which package(s) you are using. – Dason Jun 13 '14 at 04:37
  • 1
    From `?svm` "probability: indicating whether the model should allow for probability predictions." –  Jun 13 '14 at 04:38
  • I use `e1071` package. – A.M. Jun 13 '14 at 04:39
  • 1
    @user3681744 Add it to the question. People should get all relevant information from reading the question itself - they shouldn't have to dig into the comments. – Dason Jun 13 '14 at 04:41

1 Answers1

36

Setting the probability argument to TRUE for both model fitting and prediction returns, for each prediction, the vector of probabilities of belonging to each class of the response variable. These are stored in a matrix, as an attribute of the prediction object.

For example:

library(e1071)

model <- svm(Species ~ ., data = iris, probability=TRUE)
# (below I'm just predicting to the training dataset - it could of course just 
# as easily be a separate test dataset)
pred <- predict(model, iris, probability=TRUE)

head(attr(pred, "probabilities"))

#      setosa versicolor   virginica
# 1 0.9803339 0.01129740 0.008368729
# 2 0.9729193 0.01807053 0.009010195
# 3 0.9790435 0.01192820 0.009028276
# 4 0.9750030 0.01531171 0.009685342
# 5 0.9795183 0.01164689 0.008834838
# 6 0.9740730 0.01679643 0.009130620

Note, however, that it's important to set probability=TRUE for the call to svm, and not just the call to predict, since the latter alone would produce:

#      setosa versicolor virginica
# 1 0.3333333  0.3333333 0.3333333
# 2 0.3333333  0.3333333 0.3333333
# 3 0.3333333  0.3333333 0.3333333
# 4 0.3333333  0.3333333 0.3333333
# 5 0.3333333  0.3333333 0.3333333
# 6 0.3333333  0.3333333 0.3333333
jbaums
  • 27,115
  • 5
  • 79
  • 119
  • 1
    i get only NULL when i access the attribute, also i get a binary output, where as im expecting probabilities – bicepjai Jul 30 '15 at 07:19
  • @bicepjai under what circumstances?? When you run the code in my post above? – jbaums Jul 30 '15 at 07:20
  • no ur code gives correct output, my code with my data (binary predictor and other factor and numeric independent variables) gives just binary numbers and not probabilities. – bicepjai Jul 30 '15 at 07:33
  • 1
    @bicepjai are you sure you are using `probability=TRUE` in the `predict` call? If so, (and if it's not already been asked) you might need to post a new question with an example that reproduces your problem. It's too hard to work out where things have gone wrong from comments. – jbaums Jul 30 '15 at 07:37
  • Is that a natural result of a Support Vector Machine or an add-on by the library? – Chris Sep 23 '15 at 16:31
  • 1
    How are those probabilities calculated in training and prediction? Is it done by fitting a logistic regression on the SVM scores (the signed distance from each point to the separating hyperplane)? – panc Jan 22 '17 at 16:51
  • @bicepjai 's problem : Make sure your outcome variable is coded as a factor. The `"probabilities"` attribute was `NULL` for me because my outcome variable was numeric instead of factor. After the change to factor it worked fine. – Joel H May 10 '18 at 17:03
  • @panc It seems so, from the [docs](https://rdrr.io/cran/e1071/man/svm.html): "The probability model for classification fits a logistic distribution using maximum likelihood to the decision values of all binary classifiers, and computes the a-posteriori class probabilities for the multi-class problem using quadratic optimization." – ngmir Jul 20 '21 at 08:24