Different predictions for the same test set data having different levels in the factor variables in R (Ranger)

Asked Jun 03 '20 at 09:22

Active Jun 03 '20 at 10:03

Viewed 121 times

I have a training data trainhaving distance and dest_zip_code as predictor variables to predict delivery_days. I am using ranger RF model to create the 'Quantile RF regression' model object. Please note that the dest_zip_code levels in the training_data are based on 6 months.

Now, I have two identical test sets test_A and test_B

test_A has dest_zip_code from last 2 months and levels are also based on last 2 months.
test_B has dest_zip_code from last 2 months but levels are refactored to last 6 months (same levels as train data)

When I use the predict function on both the test sets with the same trained model object, at-least half of the predictions are different.

Can someone help me understand how the different factoring levels of test data with same observations is affecting the predictions?
Which one is theoretically correct?

edited Jun 03 '20 at 10:03

asked Jun 03 '20 at 09:22

Murali

wre you able to get an answer to this? – rAmAnA Jul 19 '22 at 16:36

Different predictions for the same test set data having different levels in the factor variables in R (Ranger)

0 Answers0