I am struggling a bit with a specific problem.
I want to model a decision tree.
My data set looks right now like this:
str(bWeightSantiago) 'data.frame':
160487 obs. of 13 variables:
$ BW: int 4175 2242 2487 3214 3412 5421 4152 1745 5247 3529 ...
$ married : logi TRUE FALSE TRUE TRUE TRUE TRUE ...
$ age : int 24 30 37 22 23 43 28 30 19 22 ...
$ black : logi TRUE FALSE FALSE TRUE TRUE FALSE ...
$ highschool : logi TRUE TRUE FALSE FALSE FALSE TRUE...
$ college: logi FALSE FALSE FALSE FALSE FALSE TRUE ...
$ hasChildren : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
$ drugs: logi FALSE FALSE TRUE FALSE FALSE FALSE ...
$ weightgain : int 14 45 29 50 13 48 20 24 14 39 ...
$ alone: logi FALSE FALSE FALSE FALSE TRUE FALSE ...
$ livesinownhouse : logi TRUE FALSE TRUE TRUE TRUE TRUE ...
$ cig : int 5 25 0 5 0 0 0 10 0 0 ...
$ boy : logi FALSE FALSE TRUE FALSE TRUE FALSE ...
I tried to model a classification and a regression tree each in R. The response variable is "BW" (=birthweight). If I want to modell a classification tree, I mutate the BW variable to a binary one by:
bWeightSantiago_class <- dplyr::mutate(bWeightSantiago, BW= as.factor(ifelse(BW < 2500, 1, 0)))
indicating, that all weights of the newborn below 2500g = 1. The Data is including stillborn children. This mutated response variable is saved under "bWeightSantiago_class"
I tried 2 models. Once a model with the rpart
package and once with the partykit
package.
rpart
If I try to fit a classification tree using the following function
fit1 <- rpart(BW~., data=bWeightSantiago_class, method="class")
I get the problem, that the
fit is not a tree, just a root
If I try to fit a regression tree using the same function (data set bWeightSantiago now)
fit2 <- rpart(BW~., data=bWeightSantiago, method="anova")
I get a well pruned tree with 5 leafes.
partykit
On the other side: if I want to model a regression or classification tree by using the ctree()
function:
fit3 <- ctree(BW~., data = bWeightSantiago)
#regression tree
fit4 <- ctree(BW~., data = bWeightSantiago_class)
# classification tree
I get huge trees (overfitting). Trying to plot them shows me nearly a black screen, because they are not pruned. If I try to prune them by the function prune()
I get an error message:
prune(fit4, "AIC")
Error in UseMethod("prune") : no applicable method for 'prune' applied to an object of class "c('BinaryTree', 'BinaryTreePartition')"
But it says in the data environment in R, that my fitted model is a Large BinaryTree (210.3 Mb)
How can I solve these problems?
Thanks!