I was doing a simple test on how to create a regression tree with rpart and I found out a suprising behaviour of R with my hand made data:
- when growing a tree with maxdepth = 1 --> no split is done !
- when growing a tree with maxdepth = 2 --> 2 splits are done !
Why no split is done with maxdepth = 1 ?! I guess there is some parameter of the rpart function that is "blocking the growth", but which one ?
Here are the data used:
enter code here
# generate some data
set.seed(1234)
x <- runif(200, min=0, max=1)
y <- runif(200, min=0, max=1)
mydf <- cbind.data.frame(x, y)
mydf <- mydf%>%mutate(target = ifelse(
((x>0.2)&(x<0.5) | (x>0.7)&(x<0.9)) & (y>0.1)&(y<0.8), 1, 0))
# to look at data that was generated
plot(mydf$x, mydf$y,
main = "Observations (red triangles stand for Target = 1)", #title
col = mydf$target + 1, #colours defined by an integer (in that case 1 or 2)
pch = 16 + mydf$target)
abline(v = c(0.2, 0.5, 0.7, 0.9), lty = 2, col = "grey")
abline(h = c(0.1, 0.8), lty = 2, col = "grey")
# grow a tree
mydf$target_factor <- as.factor(ifelse(mydf$target == 1, "success", "failure"))
predictors <- c("x", "y")
predictors <- paste(predictors,collapse = "+")
formula <- paste("target_factor",predictors,sep="~")
formula <- as.formula(formula)
myregressiontree <- rpart(formula, data = mydf, control = rpart.control(maxdepth = 1))
print(myregressiontree)