0

I've got a data set with 73 columns and almost all of them are factors. I try to figure our which one of them is causing this error but I'm out of ideas. Thanks to the other questions here I was able to write a loop to compare the levels and fix them if needed but there is NO difference. Does anyone else here got any idea?

This is my loop to ensure the levels are correct:

for(factor_var in factor_vars) {
  if (isFALSE(all.equal(levels(test[[factor_var]]), levels(train[[factor_var]])))) {
    print(paste('problem in:', factor_var))
    test[[factor_var]] <- factor(test[[factor_var]], levels = levels(train[[factor_var]]))
  } else {
    print(paste('ok:', factor_var))
  }
}

There was no factor changed so I really don't understand why I still got the following error:

> yhat$rf <- predict(modelLib$rf, newdata = test)
Error in predict.randomForest(.model$learner.model, newdata = .newdata,  : 
  New factor levels not present in the training data

What else can I try?

bb1
  • 141
  • 1
  • 10
  • @StupidWolf what do you mean with "I can't do levels"? Okay I get the point that levels are inherited but how is droplevels gonna help me? – bb1 Mar 15 '20 at 19:19
  • 1
    do droplevels(test) ; droplevels(train); then sapply(colnames(train),function(i)levels(train[[i]]) == levels(test[[i]])) – StupidWolf Mar 15 '20 at 19:25
  • I'm just wondering how this is supposed to help me. The problem according to the error message is that the level are not the same. My attempt is based on this answer: https://stackoverflow.com/a/32623810/9417537 it just doesn't seem to work. – bb1 Mar 16 '20 at 22:17
  • You can of course set the levels to be the same like in the post, but it like putting everything under the carpet.. I don't think it will solve the issue. – StupidWolf Mar 16 '20 at 22:22
  • Mhh... I found the column but I still got the same error. I compared the levels and they are identical. I don't really understand why predict claims they are not. I literately compare them with all.equal. – bb1 Mar 16 '20 at 22:32
  • Ok if you do table() for both train and test, is any of the levels missing? You get the error because you have a factor present in test, that is not see in train – StupidWolf Mar 16 '20 at 22:49

0 Answers0