training and test set same variable not same class in randomForest

Question

I have used a training set to train a random forest model using the randomForest package in R. A variable in the training set is character class and I converted it to as.numeric(factor()).

However, the same variable in the test set is still character. But I surprisingly found that I can still get predictions from the random forest model that I trained, even though that variable is a character class. And I also found that if I also convert that variable in test set to as.numeric(factor()), then the performances on the test set are different.

So does anyone know how R interpret and the random forest model reads and deals with the character variable in the test set while the same variable in the training set is not character class?

Thanks in advance!!

score 0 · Answer 1 · answered Aug 05 '20 at 11:04

0

This is not overly surprising. Your original variables are character strings that can be converted to numerics. So RandomForest is almost certainly doing exactly that. The simple example below reproduces the problem:

library(tidyverse)
library(randomForest)

df <- tibble::tibble(x = c(1:6), y = 1:6)

rf <- randomForest(y~., df)
# "1" is coerced
predict(rf, tibble(x = "1"))
predict(rf, tibble(x = 1))
# "b" fails
predict(rf, tibble(x = "b"))

answered Aug 05 '20 at 11:04

Robert Wilson

3,192
11
19

Why `predict(rf, tibble(x = "b"))` gives the same result as the last two?? It's weird – Ian Aug 05 '20 at 11:38
I see. it's always returning to the same result as the last two – Ian Aug 05 '20 at 11:46
I'm not sure what you mean. `predict(rf, tibble(x = "b"))` returns an error – Robert Wilson Aug 05 '20 at 13:33
It did not on my end. It gives the same result as the last two – Ian Aug 10 '20 at 23:39
In that case it is most likely something going wrong with your package installs (update or reinstall everything) or at a system level with random forest behaving differently on different os – Robert Wilson Aug 12 '20 at 17:08

training and test set same variable not same class in randomForest

1 Answers1