0

I'm new to R and rpart package. I want to create a tree using the following sample data.

My data set is similar to this mydata =

"","A","B","C","status"
"1",TRUE,TRUE,TRUE,"okay"
"2",TRUE,TRUE,FALSE,"okay"
"3",TRUE,FALSE,TRUE,"okay"
"4",TRUE,FALSE,FALSE,"notokay"
"5",FALSE,TRUE,TRUE,"notokay"
"6",FALSE,TRUE,FALSE,"notokay"
"7",FALSE,FALSE,TRUE,"okay"
"8",FALSE,FALSE,FALSE,"okay"
fit <- rpart(status ~ A + B + C, data = mydata, method = "class")

or I tried with different formulas and different methods in this. But always only the root node is produced. no plot possible. its showing

fit
n= 8 
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 8 3 okay (0.3750000 0.6250000) *

How to create the tree.? I need to show percentage of "okay" and "notokay" on each node. and i need to specify one out of A, B or C for spliting and show the statistics

  • the function rpart only shows the root tree if the other variables add no predictive value to the model, in other words: you cant't differntiate between the different types of status using your variables, they are very bad predictors with no predictive power – grrgrrbla Apr 23 '15 at 13:46

1 Answers1

1

With the default settings of rpart() no splits are considered at all. The minsplit parameter is 20 by default (see ?rpart.control) which is "the minimum number of observations that must exist in a node in order for a split to be attempted." So for your 8 observations no splitting is even considered.

If you are determined to consider splitting, then you could decrease the minbucket and/or minsplit parameters. For example

fit <- rpart(status ~ A + B + C, data = mydata,
  control = rpart.control(minsplit = 3))

produces the following tree:

fitted rpart tree

The display is created by

plot(partykit::as.party(fit), tp_args = list(beside = TRUE))

and the print output from rpart is:

n= 8 

node), split, n, loss, yval, (yprob)
      * denotes terminal node

1) root 8 3 okay (0.3750000 0.6250000)  
  2) A=FALSE 4 2 notokay (0.5000000 0.5000000)  
    4) B=TRUE 2 0 notokay (1.0000000 0.0000000) *
    5) B=FALSE 2 0 okay (0.0000000 1.0000000) *
  3) A=TRUE 4 1 okay (0.2500000 0.7500000) *

Whether or not this is particularly useful is a different question though...

Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49