First, I recommend that you use the reimplementation of ctree()
in the partykit
package which has been streamlined and improved and also has a much cleaner infrastructure for the trees. This is helpful for extracting the surrogate splits. As a reproducible example let's use
library("partykit")
ct <- ctree(Species ~ ., data = iris, maxsurrogate = 3)
Now every inner node of the tree in ct
has a $surrogates
element of (up to) 3 partysplit
objects. For example, if I want to extract the 2nd surrogate split in the 3rd node, I can do:
nodeapply(ct, ids = 3, function(n) n$surrogates[[2]])
## $`3`
## $varid
## [1] 2
##
## $breaks
## [1] 6.1
##
## $index
## [1] 1 2
##
## $right
## [1] TRUE
##
## $prob
## NULL
##
## $info
## NULL
##
## attr(,"class")
## [1] "partysplit"
This means that this surrogate splits in the varid = 2
from model.frame(ct)
(i.e., Sepal.Length
) at the splitpoint breaks = 6.1
. The smaller values go to the first child node and the rest to the second child node.
To obtain this information in human-friendly form you can do:
sp32 <- nodeapply(ct, ids = 3, function(n) n$surrogates[[2]])
character_split(sp32[[1]], model.frame(ct))
## $name
## [1] "Sepal.Length"
##
## $levels
## [1] "<= 6.1" "> 6.1"