0

I've found something strange (at least to me) when using rpart and adabag packages in R (R version 3.5.1 (2018-07-02) -- "Feather Spray")).

I'm wondering what is the reason of obtaining different trees when using both packages even if parametrization is the same. Take a look on code below:

library(rpart); library(adabag);
set.seed(32323)

N<-1000
x<-rnorm(N)
y<-0.6^2*x+sqrt(1-0.6^2)*rnorm(N)
z<-rep(0,N)
for(i in 1:N){
  if(x[i]-y[i]+0.2*rnorm(1)>1.0){
    z[i]=1
  }
}

myData<-data.frame(x,y,z)

tree<-rpart(formula=z ~ .,myData, method="anova", cp=0,maxdepth=10,minbucket=30, xval=10)
plot(tree, uniform=TRUE, compress=TRUE)
text(tree, use.n = FALSE, all=FALSE)
print(tree)

myData.Ada<-myData
myData.Ada$z<-as.factor(myData$z)
adaboost <- boosting(z ~ .,data = myData.Ada, boos = F, mfinal=1, coeflearn="Breiman", control=rpart.control(method="anova", cp=0, maxdepth=10, minbucket = 30, xval=10))
plot(adaboost$tree[[1]], uniform=TRUE, compress=TRUE)
text(adaboost$tree[[1]], use.n = FALSE, all=FALSE)
print(adaboost$tree[[1]])

for me parametrization is the same, but trees are different. As long as I know adabag uses rpart to create trees so what's the reason for this?

Regards Wojtek

  • 1
    rpart creates a single tree, while boosting creates a number of trees where each new tree tries to minimize the prediction error of the previous model. By including the new tree in the overall model the model grows gradually, and thus allows for more accurate predictions than a single tree. – Wolf Oct 22 '18 at 11:50
  • Yes, that's why I've put number of trees in adaboost to 1 to compare these two methods. – w.starosta Oct 22 '18 at 14:20

0 Answers0