4

I am struggling a bit with a specific problem.

I want to model a decision tree.

My data set looks right now like this:

str(bWeightSantiago) 'data.frame':
160487 obs. of 13 variables:
$ BW: int 4175 2242 2487 3214 3412 5421 4152 1745 5247 3529 ...
$ married : logi TRUE FALSE TRUE TRUE TRUE TRUE ...
$ age : int 24 30 37 22 23 43 28 30 19 22 ...
$ black : logi TRUE FALSE FALSE TRUE TRUE FALSE ...
$ highschool : logi TRUE TRUE FALSE FALSE FALSE TRUE...
$ college: logi FALSE FALSE FALSE FALSE FALSE TRUE ...
$ hasChildren : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
$ drugs: logi FALSE FALSE TRUE FALSE FALSE FALSE ...
$ weightgain : int 14 45 29 50 13 48 20 24 14 39 ...
$ alone: logi FALSE FALSE FALSE FALSE TRUE FALSE ...
$ livesinownhouse : logi TRUE FALSE TRUE TRUE TRUE TRUE ...
$ cig : int 5 25 0 5 0 0 0 10 0 0 ...
$ boy : logi FALSE FALSE TRUE FALSE TRUE FALSE ...

I tried to model a classification and a regression tree each in R. The response variable is "BW" (=birthweight). If I want to modell a classification tree, I mutate the BW variable to a binary one by:

bWeightSantiago_class <- dplyr::mutate(bWeightSantiago, BW= as.factor(ifelse(BW < 2500, 1, 0)))

indicating, that all weights of the newborn below 2500g = 1. The Data is including stillborn children. This mutated response variable is saved under "bWeightSantiago_class"

I tried 2 models. Once a model with the rpart package and once with the partykit package.

rpart

If I try to fit a classification tree using the following function

fit1 <- rpart(BW~., data=bWeightSantiago_class, method="class")

I get the problem, that the

fit is not a tree, just a root

If I try to fit a regression tree using the same function (data set bWeightSantiago now)

fit2 <- rpart(BW~., data=bWeightSantiago, method="anova")

I get a well pruned tree with 5 leafes.

partykit

On the other side: if I want to model a regression or classification tree by using the ctree() function:

fit3 <- ctree(BW~., data = bWeightSantiago) #regression tree

fit4 <- ctree(BW~., data = bWeightSantiago_class) # classification tree

I get huge trees (overfitting). Trying to plot them shows me nearly a black screen, because they are not pruned. If I try to prune them by the function prune() I get an error message:

prune(fit4, "AIC")

Error in UseMethod("prune") : no applicable method for 'prune' applied to an object of class "c('BinaryTree', 'BinaryTreePartition')"

But it says in the data environment in R, that my fitted model is a Large BinaryTree (210.3 Mb)

How can I solve these problems?

Thanks!

  • 1
    This will help you concerning your first problem: https://stackoverflow.com/questions/37548651/decision-tree-in-r-errorfit-is-not-a-tree-just-a-root – AntoniosK Oct 31 '18 at 16:08

0 Answers0