0

I am very new to machine learning. I am trying to explore fitting random forests with the ranger library in R. My dependent variable is continuous - so it would be a regression tree (and not just classification). Upon trying out the functions, I have noticed that there seems to be a discrepancy between ranger and predict ranger. The following lines result in different predictions in results and results_alternative:

rf_reg <- ranger(formula = y ~ ., data = training_df)

results <- rf_reg$predictions
results_alterantive <- predict(rf_reg, data = training_df)$predictions

Could anybody please explain why there is a discrepancy and what is causing it? Which one is correct? I have tried it with classification on iris data and that seemed to give the same results. Many thanks!

Jhonny
  • 145
  • 8
  • I am writing this as a comment because I'm going by memory / intuition, but I believe it's just an in-sample vs out-of-sample prediction difference. Even though the training_df is using for both. For instance, I would expect the first be something like a stacked prediction method and the second to be the final predictions. The docs probably describe it as well as any implicit partitioning settings. – Hack-R Feb 19 '23 at 18:28

0 Answers0