2

I am playing with rpart. I am using a play data set of a bank with 5000 rows, with 7 IVs and the class has 2 factors.

the original model is (no control arguments set):

UB_rpart <- rpart(UB_tree, method="class", data=UBank_train)

I then create a tree in rpart.plot without issue.

Then I try to prune the tree some with adding some control arguments:

Pruned_UB_rpart <-prune(UB_rpart,cp=.01, minsplit=10, minbucket=round(minsplit/3))

I plot the pruned tree and it looks identical to the first tree. I keep changing minsplit (even up to 1000), to see when it changes. The tree never does not change.

However when I change cp to .05, then the tree changes.

SO why is minsplit not pruning the tree. Am I not using it right?

mpg
  • 3,679
  • 8
  • 36
  • 45

1 Answers1

1

Find below explanation for pruning with the help of Kyphosis Data:

>printcp(rpart.kyphosis)  

enter image description here printcp displays the cp table for fitted rpart object and prints a table of optimal prunings based on a complexity parameter. To determine if the tree is appropriate or if some of the branches need to be subjected to pruning we can use the cptable. The cptable provides a brief summary of the overall fit of the model. The table is printed from the smallest tree (0 splits) to the largest one (7 splits). CPtable always lists number of splits and not the number of nodes (which is 1+the number of splits).

There is a rule called 1 SE rule for finding the best number of splits. According to this rule take the smallest Xerror (1.2941) and add to it corresponding standard error (0.2355). Sum it to get 1.5296. We need to find that split that has least splits and is smaller than this number. Actually, all 4 fit in this range. So, we take the fewest (nsplit=0) and the corresponding CP value (0.176) and use that to prune.

>fit2 = prune(rpart.kyphosis,cp=0.176)
Prasanna Nandakumar
  • 4,295
  • 34
  • 63