1

Having a problem similar to this, I am trying to force rpart to do exactly one split. Here is a toy example that reproduces my problem:

require(rpart)

y <- factor(c(1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))
x1 <- c(12,18,15,10,10,10,20,6,7,34,7,11,10,22,4,19,10,8,13,6,7,47,6,15,7,7,21,7,8,10,15)
x2 <- c(318,356,341,189,308,236,290,635,550,287,261,472,282,262,1153,435,402,182,415,544,251,281,378,498,142,566,152,560,284,213,326)

data <- data.frame(y=y,x1=x1,x2=x2)
tree <-rpart(y~.,
              data=data,
              control=rpart.control(maxdepth=1, # at most 1 split
                                    cp=0, # any positive improvement will do
                                    minsplit=1,
                                    minbucket=1, # even leaves with 1 point are accepted
                                    xval=0)) # I don't need crossvalidation
length(tree$frame$var) #==1, so there are no splits

Isolating a single point should be possible (minbucket=1) and even the most marginal improvement (isolating one point always decreases the misclassification rate) should lead to the split being kept (cp=0).

Why does the result not include any splits? And how do I have to alter the code to always get exactly one split? Can it be that splits are not kept if both classify to the same factor output?

Community
  • 1
  • 1
user1965813
  • 671
  • 5
  • 16
  • A reproducible example would be helpful. – ARobertson Feb 27 '15 at 18:45
  • @ARobertson Okay, here you go. I hope you can reproduce the problem. – user1965813 Mar 02 '15 at 10:25
  • I would recommend using `ctree` from `party` package instead. It either runs significance tests per each split (which helps to prevent overfitting) and also has `stump` variable within `ctree_control` which answers your exact needs. – David Arenburg Mar 02 '15 at 10:38
  • @DavidArenburg I will certainly look into it, for general classification. That said, as in the end I am trying to use boosting (as in the linked problem), I would prefer a solution that keeps letting me use rpart. – user1965813 Mar 02 '15 at 12:14

1 Answers1

3

Change cp = 0 to cp = -1.

Apparently the cp for the first split (maxdepth = 3) is 0.0000000. So going negative allows it to show up with maxdepth = 1.

ARobertson
  • 2,857
  • 18
  • 24