For each observation in a data frame that trains a random forest model, there is a set of trees (of size ~1/3 of the total number of forest trees) for which that observation was not in-bag. I would like to get a measure of spread of such out-of-bag, tree-level predictions at each observation, ideally by retrieving a prediction from each tree.
Is there a way to do this for random forest models fit using the ranger
package in R?
library(ranger)
data("iris")
iris_train <- sample(1:nrow(iris), size=floor(nrow(iris)*0.8))
new_data <- setdiff(1:nrow(iris), iris_train)
rf <- ranger::ranger(formula=Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species,
data=iris[iris_train,])
# OOB predictions (average only):
rf$predictions
Note that for new data, it is possible to get tree-level predictions from a random forest model using predict.ranger(..., predict.all=TRUE)
. I do not see such an option for returning in-sample but out-of-bag tree-level predictions.
# New data predictions (all trees):
p <- predict(rf, iris[new_data,], predict.all = TRUE)