I am building a regressor using decision trees. I am trying to find the best way to get a perfect combination of the four main parameters I want to tune: Cost complexity, Max Depth, Minimum split, Min bucket size
I know there are ways to determine Cost complexity (CP) parameter but how to determine all 4 which I want to use so that the end result has the least error?
Reproducible example below:
library(rpart)
set.seed(1234)
train_index <- sample(nrow(Boston),0.75*nrow(Boston))
boston_train <- Boston[train_index,]
boston_test <- Boston[-train_index,]
prune_control <- rpart.control(maxdepth = 5, cp = 0.005, minbucket = 20,minsplit =20 ) #numbers are just representative having no real significance
boston.rpart <- rpart(medv ~ .,data = boston_train, method = "anova", control = prune_control)
train_pred <- predict(object = boston.rpart)
test_pred <- predict(boston.rpart, boston_test)