0

I was doing a simple test on how to create a regression tree with rpart and I found out a suprising behaviour of R with my hand made data:
- when growing a tree with maxdepth = 1 --> no split is done !
- when growing a tree with maxdepth = 2 --> 2 splits are done !

Why no split is done with maxdepth = 1 ?! I guess there is some parameter of the rpart function that is "blocking the growth", but which one ?

Here are the data used:

enter code here
# generate some data
set.seed(1234)
x <- runif(200, min=0, max=1)
y <- runif(200, min=0, max=1)
mydf <- cbind.data.frame(x, y)
mydf <- mydf%>%mutate(target = ifelse(
  ((x>0.2)&(x<0.5) | (x>0.7)&(x<0.9)) & (y>0.1)&(y<0.8), 1, 0))

# to look at data that was generated
plot(mydf$x, mydf$y, 
     main = "Observations (red triangles stand for Target = 1)", #title
     col = mydf$target + 1, #colours defined by an integer (in that case 1 or 2)
     pch = 16 + mydf$target)
abline(v = c(0.2, 0.5, 0.7, 0.9), lty = 2, col = "grey") 
abline(h = c(0.1, 0.8), lty = 2, col = "grey") 

# grow a tree
mydf$target_factor <- as.factor(ifelse(mydf$target == 1, "success", "failure"))
predictors <- c("x", "y")
predictors <- paste(predictors,collapse = "+")
formula <- paste("target_factor",predictors,sep="~")
formula <- as.formula(formula)

myregressiontree <- rpart(formula, data = mydf, control = rpart.control(maxdepth = 1))
print(myregressiontree)
chapelon
  • 133
  • 2
  • 8

0 Answers0