1

I have a training data trainhaving distance and dest_zip_code as predictor variables to predict delivery_days. I am using ranger RF model to create the 'Quantile RF regression' model object. Please note that the dest_zip_code levels in the training_data are based on 6 months.

Now, I have two identical test sets test_A and test_B

  • test_A has dest_zip_code from last 2 months and levels are also based on last 2 months.
  • test_B has dest_zip_code from last 2 months but levels are refactored to last 6 months (same levels as train data)

When I use the predict function on both the test sets with the same trained model object, at-least half of the predictions are different.

  1. Can someone help me understand how the different factoring levels of test data with same observations is affecting the predictions?
  2. Which one is theoretically correct?
Murali
  • 154
  • 1
  • 12

0 Answers0