I've got a data set with 73 columns and almost all of them are factors. I try to figure our which one of them is causing this error but I'm out of ideas. Thanks to the other questions here I was able to write a loop to compare the levels and fix them if needed but there is NO difference. Does anyone else here got any idea?
This is my loop to ensure the levels are correct:
for(factor_var in factor_vars) {
if (isFALSE(all.equal(levels(test[[factor_var]]), levels(train[[factor_var]])))) {
print(paste('problem in:', factor_var))
test[[factor_var]] <- factor(test[[factor_var]], levels = levels(train[[factor_var]]))
} else {
print(paste('ok:', factor_var))
}
}
There was no factor changed so I really don't understand why I still got the following error:
> yhat$rf <- predict(modelLib$rf, newdata = test)
Error in predict.randomForest(.model$learner.model, newdata = .newdata, :
New factor levels not present in the training data
What else can I try?