0

I have trained model using random forest algorithm. Now I want to predict results using this model on data set which contains only one records.

When I tried to execute predict command it throw following error.

Error in predict.randomForest(model, test1, type = "response") : Type of predictors in new data do not match that of the training data.

Noticed that it is because of different levels for factor variable in training and testing data frame.

So I found one solution in stakoverflow to modify levels using script

common <- intersect(names(train), names(test1)) 
for (p in common) { if (class(train[[p]]) == "factor") { levels(test1[[p]]) <- levels(train[[p]]) } }

Please refer to below link query.

r random forest error - type of predictors in new data do not match

But, unfortunately it changes the value in data for most of the variables.

For example:

In test1 data frame there is one variable name "Category" having value ">=100" it changes to "11-50"

user3734568
  • 1,311
  • 2
  • 22
  • 36

1 Answers1

0

We need to only change for the factorclass

nm1 <- names(which(sapply(train, is.factor)))
for (p in nm1) { 
    levels(test1[[p]]) <- levels(train[[p]]) 
 } 

If it is based on the randomForest, we don't even have to look for the train data. Get the xlevels from the model object and assign the levels of 'test1' columns based on that

lvlslst <- model[["forest"]][["xlevels"]]
lvlsCols <- names(lvlslst)[sapply(lvlslst, is.character)]
for(j in lvlsCols) {
   levels(test1[[j]]) <- lvlslst[[j]]

}
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks for your response. I tried both solutions which you provided, but it changes the values as well in test data frame. – user3734568 Aug 04 '17 at 06:36
  • @user3734568 I am only assigning the `levels` as in your code and not changing anything. You have to check your dataset – akrun Aug 04 '17 at 08:18
  • Thanks for your response. I think in my test1 data has only one row and for Category one level ">=100", but my train data has 400 records and category has 3 level "11-50", "51-100" and ">=100". It seems when I use script levels(test1[[p]]) <- levels(train[[p]]), for test 1 it considers first level of train sample and replace value with level 1 in train data frame. – user3734568 Aug 04 '17 at 11:30
  • @user3734568 randomForest requires the levels to be the same for training and test. Please check if there are leading/lagging spaces in those levels in either datasets – akrun Aug 04 '17 at 11:55