0

I am working with the R programming language. I am trying to learn how to make a "confusion matrix" for multiclass variables (e.g. How to construct the confusion matrix for a multi class variable).

Suppose I generate some data and fit a decision tree model :

#load libraries

library(rpart)
library(caret)
    
#generate data

a <- rnorm(1000, 10, 10)

b <- rnorm(1000, 10, 5)

d <- rnorm(1000, 5, 10)
 


group_1 <- sample( LETTERS[1:3], 1000, replace=TRUE, prob=c(0.33,0.33,0.34) )


e = data.frame(a,b,d, group_1)

e$group_1 = as.factor(d$group_1)

#split data into train and test set
trainIndex <- createDataPartition(e$group_1, p = .8, 
                                  list = FALSE, 
                                  times = 1)
training <- e[trainIndex,]
test  <- e[-trainIndex,]


fitControl <- trainControl(## 10-fold CV
    method = "repeatedcv",
    number = 5,
    ## repeated ten times
    repeats = 1)
    
#fit decision tree model
    TreeFit <- train(group_1 ~ ., data = training, 
                     method = "rpart2", 
                     trControl = fitControl)

From here, I am able to store the results into a "confusion matrix":

pred <- predict(TreeFit,test)
table_example <- table(pred,test$group_1)

This satisfies my requirements - but this "table" requires me to manually calculate the different accuracy metrics of "A", "B" and "C" (as well as the total accuracy).

My question: Is it possible to use the caret::confusionMatrix() command for this problem?

e.g.

  pred <- predict(TreeFit, test, type = "prob")
  labels_example <- as.factor(ifelse(pred[,2]>0.5, "1", "0"))
  con <- confusionMatrix(labels_example, test$group_1)

This way, I would be able to directly access the accuracy measurements from the confusion matrix. E.g. metric = con$overall[1]

Thanks

Sinval
  • 1,315
  • 1
  • 16
  • 25
stats_noob
  • 5,401
  • 4
  • 27
  • 83

1 Answers1

1

Is this what you're looking for?

pred <- predict(
  TreeFit,
  test)
con <- confusionMatrix(
  test$group_1,
  pred)
con
con$overall[1]

Same output as in:

table(test$group_1, pred)

Plus accuracy metrics.

Max Serna
  • 36
  • 1