I have the following data and I need to create a regression tree in R by using rpart library to predict the rental duration.
The output of the structure is :
- data.frame: 1000 obs. of 6 variables:
- $ rental_duration : int 6 3 7 5 6 3 6 6 3 6 ...
- $ rental_rate : num 0.99 4.99 2.99 2.99 2.99 2.99 4.99 4.99 2.99 4.99 ...
- $ length : int 86 48 50 117 130 169 62 54 114 63 ...
- $ replacement_cost: num 21 13 19 27 23 ...
- $ rating : Factor w/ 5 levels "G","NC-17","PG",..: 3 1 2 1 1 3 4 5 4 2 ...
- $ name : Factor w/ 16 levels "Action","Animation",..: 6 11 6 11 8 9 5 11 11 15
After running
m1 <- rpart(formula = rental_duration ~ .,
data = training_set2,
method = "anova")
I get :
The issue is as soon as I plot the cross-validation error vs complexity parameter I should get a curve where the cross-validation error decreases when the cp parameter decreases as well, but as you can see I get the opposite. I thought that it could be due to the factor and I had transformed it as numeric, but nothing has changed.
Can someone give some hint if I am doing something wrong?