0

I have built a multi-layer decision tree using rpart and I am trying to replicate the tree structure using partykit package, more specifically, the partysplit-partynodecombo.

I am currently having an issue with the order difference between rpart and partysplit.

The decision tree coming from rpart always take "greater than" sign (>) first then "less than" sign (<) underneath while partykit is the opposite. e.g., the rpart output

 [6] value.a >= 33: FALSE. (n = 63, err = 33.3%)
 [7] value.a< 33: FALSE. (n = 74, err = 8.1%)

vs. the partykit output

 [6] value.a < 33: FALSE. (n = 74, err = 8.1%)
 [7] value.a >= 33: FALSE. (n = 63, err = 33.3%)

As a result, I am having trouble reading the decision tree in correct order and using partykit to recreate the tree from rpart.

Is there a way I can create a tree from rpart such as the tree take "less than" sign first, or is there an option on partysplit such that you make the split take the "greater than" sign first?

Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49
Richard Li
  • 21
  • 1
  • 9

1 Answers1

1

The partysplit() constructor function provides the arguments index and right which can be used to create all combinations. The index controls whether the left part of the interval is presented first or second. The right argument controls where the equal sign goes. For a simple dummy data set:

d <- data.frame(x = 1:10)

we can create all for combinations:

sp1 <- partysplit(1L, 5.5, index = 1:2, right = TRUE)
sp2 <- partysplit(1L, 5.5, index = 2:1, right = TRUE)
sp3 <- partysplit(1L, 5.5, index = 1:2, right = FALSE)
sp4 <- partysplit(1L, 5.5, index = 2:1, right = FALSE)

Then the corresponding character labels can be computed as:

character_split(sp1, d)$levels
## [1] "<= 5.5" "> 5.5" 
character_split(sp2, d)$levels
## [1] "> 5.5"  "<= 5.5"
character_split(sp3, d)$levels
## [1] "< 5.5"  ">= 5.5"
character_split(sp4, d)$levels
## [1] ">= 5.5" "< 5.5" 

The as.party() method for rpart objects tries to preserve this. For example:

library("partykit")
library("rpart")
iris2 <- iris[1:100,]
rp <- rpart(Species ~ ., data = iris2)
rp
## n= 100 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 100 50 setosa (0.5000000 0.5000000)  
##   2) Petal.Length< 2.45 50  0 setosa (1.0000000 0.0000000) *
##   3) Petal.Length>=2.45 50  0 versicolor (0.0000000 1.0000000) *
as.party(rp)
## Model formula:
## Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
## 
## Fitted party:
## [1] root
## |   [2] Petal.Length < 2.45: setosa (n = 50, err = 0.0%)
## |   [3] Petal.Length >= 2.45: versicolor (n = 50, err = 0.0%)
## 
## Number of inner nodes:    1
## Number of terminal nodes: 2
Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49