0

I'm working with the Boston Housing data set, making models using trees. It's possible to calculate the optimal number of trees using cross-validation, as the last line shows (in this case 8 trees):

library(tree)
library(MASS)
tree.test.RMSE <- 0
df <- MASS::Boston

for (i in 1:5){
  
  idx <- sample(seq(1, 3), size = nrow(df), replace = TRUE, prob = c(.6, .2, .2))
  train <- df[idx == 1,]
  test <- df[idx == 2,]
  validation <- df[idx == 3,]

library(tree)
tree.model <- tree::tree(train$medv ~ . , data = train)
tree.model.cv <- cv.tree(tree.model)
n.trees <- tree.model.cv$size[which.min(tree.model.cv$dev)]

However, I can't find any way to use the optimal number of trees in the trees predictive modeling process for the test and validation data sets. For example, this returns an error:

tree.model.1 <- tree::tree(train$medv ~ ., data = train, trees = n.trees)
Error in tree.control(nobs, ...) : unused argument (trees = 1)

and this has no impact whether the size is included or not:

tree.test.RMSE<- Metrics::rmse(actual = test$medv, predicted = predict(object = tree.model, newdata = test, size = tree.model.cv$size[which.min(tree.model.cv$dev)]))
5.117792

Not including size:

tree.test.RMSE <- Metrics::rmse(actual = test$medv, predicted = predict(object = tree.model, newdata = test))
[1] 5.117792

How is a tree model made that uses the optimal number of trees?

Russ Conte
  • 124
  • 6

0 Answers0