I am invoking ranger to model a multi-classification problem of a big mixed-data frame (where some categorical variables have more than 53 levels). Training and Testsing runs without any problem. However, interpretting confusion matrix/ contigency table gives hiccups.
I am using iris data rather to explain the difficulties I am facing, by treating Species as the classification variable,
library(ranger)
library(caret)
# Data
idx = sample(nrow(iris),100)
data = iris
# Split data sets
Train_Set = data[idx,]
Test_Set = data[-idx,]
# Train
Species.ranger <- ranger(Species ~ ., ,data=Train_Set,importance="impurity", save.memory = TRUE, probability=TRUE)
# Test
probabilitiesSpecies <- predict(Species.ranger, data = Test_Set,type='response', verbose = TRUE)
or
probabilitiesSpecies <- as.data.frame(predict(Species.ranger, data = Test_Set,type='response', verbose = TRUE)$predictions)
the following difficulties are encountered:
table(Test_Set$Species, probabilitiesSpecies$predictions)
Error in table(Test_Set$Species, probabilitiesSpecies$predictions) :
all arguments must have the same length
or
caret::confusionMatrix(Test_Set$Species, probabilitiesSpecies$predictions)
or
caret::confusionMatrix(table(Test_Set$Species, max.col(probabilitiesSpecies)-1))
gives
Error: `data` and `reference` should be factors with the same levels.
A biclassification shown below, however, works:
idx = sample(nrow(iris),100)
data = iris
data$Species = factor(ifelse(data$Species=="virginica",1,0))
Train_Set = data[idx,]
Test_Set = data[-idx,]
# Train
Species.ranger <- ranger(Species ~ ., ,data=Train_Set,importance="impurity", save.memory = TRUE, probability=TRUE)
# Test
probabilitiesSpecies <- as.data.frame(predict(Species.ranger, data = Test_Set,type='response', verbose = TRUE)$predictions)
caret::confusionMatrix(table(max.col(probabilitiesSpecies)-1, Test_Set$Species))
How can this issue be tackled for multi-classification to get the confusion matrix? I have posed this as a seperate thread too (Error while computing confusion matrix for multiclassification using ranger)