I'm working with the Boston Housing data set, making models using trees. It's possible to calculate the optimal number of trees using cross-validation, as the last line shows (in this case 8 trees):
library(tree)
library(MASS)
tree.test.RMSE <- 0
df <- MASS::Boston
for (i in 1:5){
idx <- sample(seq(1, 3), size = nrow(df), replace = TRUE, prob = c(.6, .2, .2))
train <- df[idx == 1,]
test <- df[idx == 2,]
validation <- df[idx == 3,]
library(tree)
tree.model <- tree::tree(train$medv ~ . , data = train)
tree.model.cv <- cv.tree(tree.model)
n.trees <- tree.model.cv$size[which.min(tree.model.cv$dev)]
However, I can't find any way to use the optimal number of trees in the trees predictive modeling process for the test and validation data sets. For example, this returns an error:
tree.model.1 <- tree::tree(train$medv ~ ., data = train, trees = n.trees)
Error in tree.control(nobs, ...) : unused argument (trees = 1)
and this has no impact whether the size is included or not:
tree.test.RMSE<- Metrics::rmse(actual = test$medv, predicted = predict(object = tree.model, newdata = test, size = tree.model.cv$size[which.min(tree.model.cv$dev)]))
5.117792
Not including size:
tree.test.RMSE <- Metrics::rmse(actual = test$medv, predicted = predict(object = tree.model, newdata = test))
[1] 5.117792
How is a tree model made that uses the optimal number of trees?