0

I am experimenting with SVM function on iris data. The objective is to extract the "class" of highest predicted probability for (1) each row (2) from the output matrix attr(pred_prob, "probabilities").

data(iris)
attach(iris)
x <- subset(iris, select = -Species) 
y <- Species
model <- svm(x, y, probability = TRUE)
pred_prob <- predict(model, x, decision.values = TRUE, probability = TRUE)
attr(pred_prob, "probabilities")

(The original code came from this previous thread.)
The last line of code will give us an output of the following format:

       setosa  versicolor   virginica
1 0.979989881 0.011347796 0.008662323
2 0.972567961 0.018145783 0.009286256
3 0.978668604 0.011973933 0.009357463

For ease of comparing these predicted probabilities with their real class "labels" (i.e., setosa, versicolor, virginica), I plan to extract the class of highest predicted probability for each row from the above output matrix. For example, the class of highest probability for the first observation is setosa with predicted probability of 0.9799, which is returned from

which(attr(pred_prob, "probabilities")[1,] == max(attr(pred_prob, "probabilities")[1,]), arr.ind = TRUE)

I am now working on extending the above code into a loop in order to output a data column containing predicted class label for each observation in the data. Below is what I have so far, but I am having a hard time

predicted_class <- attr(pred_prob, "probabilities")
for(row in 1:nrow(predicted_class)) {
output <- print(which(predicted_class[row,] == max(predicted_class[row,]), arr.ind = TRUE))
output
}

But this does not give me what I intended it to be, it seems only to return the predicted class from a random row (while I want to a column of predicted classes for all observations). Could anyone enlighten me on this?

Chris T.
  • 1,699
  • 7
  • 23
  • 45

1 Answers1

2

Use max.col

colnames(pred_prob)[max.col(pred_prob)]
#[1] "setosa" "setosa" "setosa"

Or using a loop

output <- vector("double", nrow(pred_prob))

for(row in 1:nrow(pred_prob)) {
  output[row] <- which.max(pred_prob[row,])
}

output
#[1] 1 1 1

Or apply

apply(pred_prob, MARGIN = 1, FUN = which.max)
#1 2 3 
#1 1 1 

data

pred_prob <- structure(c(0.979989881, 0.972567961, 0.978668604, 0.011347796, 
0.018145783, 0.011973933, 0.008662323, 0.009286256, 0.009357463
), .Dim = c(3L, 3L), .Dimnames = list(c("1", "2", "3"), c("setosa", 
"versicolor", "virginica")))
markus
  • 25,843
  • 5
  • 39
  • 58