0

I have the following data and I need to create a regression tree in R by using rpart library to predict the rental duration. enter image description here

The output of the structure is :

 - data.frame:  1000 obs. of  6 variables:
 - $ rental_duration : int  6 3 7 5 6 3 6 6 3 6 ...
 - $ rental_rate     : num  0.99 4.99 2.99 2.99 2.99 2.99 4.99 4.99 2.99 4.99 ...
 - $ length          : int  86 48 50 117 130 169 62 54 114 63 ...
 - $ replacement_cost: num  21 13 19 27 23 ...
 - $ rating          : Factor w/ 5 levels "G","NC-17","PG",..: 3 1 2 1 1 3 4 5 4 2 ...
 - $ name            : Factor w/ 16 levels "Action","Animation",..: 6 11 6 11 8 9 5 11 11 15

After running

m1 <- rpart(formula = rental_duration ~ .,
            data    = training_set2,
            method  = "anova")

I get :

enter image description here

The issue is as soon as I plot the cross-validation error vs complexity parameter I should get a curve where the cross-validation error decreases when the cp parameter decreases as well, but as you can see I get the opposite. I thought that it could be due to the factor and I had transformed it as numeric, but nothing has changed.

enter image description here

Can someone give some hint if I am doing something wrong?

Phil
  • 7,287
  • 3
  • 36
  • 66
Alex
  • 67
  • 1
  • 8

0 Answers0