Error while computing confusion matrix for multiclassification using ranger

Question

I am trying to compute the confusion matrix of a multi-classification problem of a very big data frame, which is divided and scaled as Train_Scale and Test_Scale (scales of Train set are used for scaling Test) sets.

Ranger was used to do modelling:

set.seed(123)
library(ranger)
library(caret)
Class.ranger <- ranger(Class~., data = Train_Scale, num.trees = 5000, importance = "impurity", save.memory = TRUE, probability = TRUE)

The variable Class has 5 levels:

str(Test_Scale$Class)
 Factor w/ 5 levels "A","B",..: 5 1 1 1 1 5 5 5 1 1 ...

Validation is done on the test set as follows:

set.seed(123)
probabilitiesClass <- predict(Class.ranger, data = Test_Scale, num.trees = 5000, type='response', verbose = TRUE)

The probabilitiesClass is a List of 5 as shown below:

I get the following error while trying to interpret the results via confusion matrix:

> caret::confusionMatrix(Test_Scale$Class, probabilitiesClass$predictions)
Error: `data` and `reference` should be factors with the same levels.

Should predictions in the figure above must be factor (since it is presently double), and since Class is a factor with 5 levels?

Or, trying to use table (note: there are no NA values appearing either) gives the following error:

table(Test_Scale$Class, probabilitiesClass$predictions)
Error in table(Test_Scale$Class, probabilitiesClass$predictions): 
all arguments must have the same length

What is going wrong and how can the confusion matrix be obtained for the multiclass classification using ranger (preferred, since caret interprets only upt0 53 levels?) and caret?

Serkan · Answer 1 · 2021-08-04T14:46:18.797

0

Set type = 'raw' instead of response to get the predicted class instead of the predicted probabilities.

probabilitiesClass <- predict(
       Class.ranger,
       data = Test_Scale,
       num.trees = 5000,
       type='raw',
       verbose = TRUE
)

That would make you comparison in the confusionMatrix possible.

edited Aug 04 '21 at 14:46

answered Aug 04 '21 at 14:38

Serkan

1,855
6
20

I get the following error when I use raw instead of response: Error in predict.ranger.forest(forest, data, predict.all, num.trees, type, : Error: Invalid value for 'type'. Use 'response', 'se', 'terminalNodes', or 'quantiles'. – Ray Aug 04 '21 at 14:49
OK - then my generic answer is not applicable. Can you please reproduce your `data` and `code` so we can have a better look? – Serkan Aug 04 '21 at 14:51
Ive tested this with `caret` and it works as intended - but `ranger` does not. I dont have an answer readily available for that! \o/ – Serkan Aug 04 '21 at 18:03
I have posed the code for iris data in this thread: https://stackoverflow.com/questions/68664858/error-in-calculating-confusion-matrix-or-contigency-table-for-multiclassificatio – Ray Aug 05 '21 at 10:27

Error while computing confusion matrix for multiclassification using ranger

1 Answers1

Linked