I want to perform a multi-class classification in the caret
package. Below is a minimum example.
library(caret)
library(randomForest)
x <- data.frame("A"=seq(1,100), "B"=seq(1,100), "C"="class1")
x[,"C"] <- as.character(x[,"C"])
x[1,"C"] <- "class2"
x[2,"C"] <- "class3"
x[3,"C"] <- "class4"
x[4,"C"] <- "class5"
x[5,"C"] <- "class6"
x[6,"C"] <- "class7"
x[7,"C"] <- "class8"
x[8,"C"] <- "class9"
x[9,"C"] <- "class10"
x[10,"C"] <- "class11"
x[11,"C"] <- "class12"
x[,"C"] <- as.factor(x[,"C"])
control <- trainControl(method="repeatedcv", number=10, repeats=1, search="grid") set.seed(5) tunegrid <- expand.grid(.mtry=c(1:2)) fit <- train(x=x[,1:2], y=x$C, method="rf", metric=metric, tuneGrid=tunegrid, trControl=control)
print(fit)
plot(fit)
When running the code I get an error stating 1: model fit failed for Fold2.Rep1: mtry=1 Error in randomForest.default(x, y, mtry = param$mtry, ...) :
Can't have empty classes in y.
Related posts suggest that this is due to unaccounted factors in the response variable - which is not taken account of in resampling. Typically, one runs into the problem, if there is a higher number of classes to be predicted (and little observations).
Is there any workaround to change the caret package such that the missing factors are removed in the resampling methods (e.g., by droplevels()
)?