1

I am using the confusionMatrix function from the caret library in R to evaluate the performance of a couple of methods such as (elasticnet from glmnet library, gaussian processors from kernlib, randomforest ) on a two class data.

I can see sometimes for some of the methods, I am getting

Warning message: In confusionMatrix.default(pred, truth) : Levels are not in the same order for reference and data. Refactoring data to match.

and the performance is e.g 65 percent; however, if I relabel the levels (change the orders) of the predictions (in above example, pred), based on the "truth"; the performance becomes 25%.

I constructed the following toy data.

pred = c("a", "a", "a", "b")
pred = as.factor(pred)
levels(pred) = rev(levels(pred)) % given this line, I can either get 25% or 75%.

truth = c("a", "a", "b", "b")
truth = as.factor(truth)

confusionMatrix(pred, truth)

I understand it is intuitive, since it is a two-classed data. However, I wonder, if I do such to my favour; meaning if the performance is 25% (simply, accepting it as 75%).

Areza
  • 5,623
  • 7
  • 48
  • 79

1 Answers1

2

See ?caret::confusionMatrix, specifically the parameter positive

positive an optional character string for the factor level that corresponds to a "positive" result (if that makes sense for your data). If there are only two factor levels, the first level will be used as the "positive" result.

On a second note, unless you're classes are roughly 50-50 you should probably evaluate your results with something other than a confusion matrix.

alexwhitworth
  • 4,839
  • 5
  • 32
  • 59
  • so, hypothetically, if one makes a classifier for a two-class data, that can predict 30%; non-cancer patients from cancer patients, (s) he can not turn it over and say, it predicts 70% cancer patient !? - I like to hear your comments on that too :-). - I have to add I don't think, the "positive" parameter can work here. – Areza Aug 19 '15 at 15:10
  • Huh? if you only have two classes, then non-membership in one class is a sufficient condition for membership in the other class. – alexwhitworth Aug 19 '15 at 18:16
  • If you do not use `positive == "..."` then the reference class is defaulted to the first level. So when you use `rev(levels(pred))` you get the result you saw. If you want to use `rev(levels(pred))` then you must use `positive == "..."` to get the same result in each case. – alexwhitworth Aug 19 '15 at 18:17