How to set optimal number of trees

Question

I'm working with the Boston Housing data set, making models using trees. It's possible to calculate the optimal number of trees using cross-validation, as the last line shows (in this case 8 trees):

library(tree)
library(MASS)
tree.test.RMSE <- 0
df <- MASS::Boston

for (i in 1:5){
  
  idx <- sample(seq(1, 3), size = nrow(df), replace = TRUE, prob = c(.6, .2, .2))
  train <- df[idx == 1,]
  test <- df[idx == 2,]
  validation <- df[idx == 3,]

library(tree)
tree.model <- tree::tree(train$medv ~ . , data = train)
tree.model.cv <- cv.tree(tree.model)
n.trees <- tree.model.cv$size[which.min(tree.model.cv$dev)]

However, I can't find any way to use the optimal number of trees in the trees predictive modeling process for the test and validation data sets. For example, this returns an error:

tree.model.1 <- tree::tree(train$medv ~ ., data = train, trees = n.trees)
Error in tree.control(nobs, ...) : unused argument (trees = 1)

and this has no impact whether the size is included or not:

tree.test.RMSE<- Metrics::rmse(actual = test$medv, predicted = predict(object = tree.model, newdata = test, size = tree.model.cv$size[which.min(tree.model.cv$dev)]))

5.117792

Not including size:

tree.test.RMSE <- Metrics::rmse(actual = test$medv, predicted = predict(object = tree.model, newdata = test))
[1] 5.117792

How is a tree model made that uses the optimal number of trees?

?? Do you mean number of trees or *size* of tree (number of splits)?? — Ben Bolker, May 27 '23 at 14:59
You are correct - it's the size of the tree, the clarification is appreciated. How is the optimal size used to make tree models? — Russ Conte, May 27 '23 at 16:08

How to set optimal number of trees

0 Answers0