R partykit::ctree() how to break tie in selecting splitting variable of identical p-value

Question

For a node x in partykit::ctree object, I use the following lines to get the splitting variables on the node:

k=info_node(x)
names(k$p.value)

However, a splitting variables of a node returned by this code is different from the one on the tree created by plot. It turns out that three columns in k$criterion have the minimum p-value; i.e.

inds=which(k$criterion['p.value',]==k$p.value)
length(inds) #3

Seems the info_node(x) returns the 1st of the three variables as names(k$p.value), but plot chooses the 3rd one. I wonder if such discrepancy is caused by two reasons:

Multiple variables have the minimum p-value, and there is an internal method to break such a tie in selecting only one splitting variable.
Maybe these three variable have slightly different p-value, but because of the fixed p-value precision in k$criterion, they appear to have the same p-value.

Any insight is appreciated!

score 0 · Accepted Answer · answered Jul 01 '20 at 03:14

0

The comparisons are done internally on the log-p-value scale, i.e., are more reliable in case of tiny p-values. If ties (within machine precision) still remain for the p-value, they are broken based on the size of the corresponding test statistic.

answered Jul 01 '20 at 03:14

Achim Zeileis

15,710
1
39
49

Thanks! how is the `names(k$p.value)` chosen? – blueskyddd Jul 02 '20 at 14:27
Yes. This is the name of the variable associated with the smalles p-value - after breaking ties if necessary. – Achim Zeileis Jul 02 '20 at 23:04
Then why is `names(k$p.value)` different from the one on the tree by `plot`? – blueskyddd Jul 03 '20 at 01:13
Please post a minimal self-contained reproducible example, then we can have a look. – Achim Zeileis Jul 03 '20 at 01:16
1

hi @AchimZeileis, I posted an example! Can you please take a look? – blueskyddd Jul 06 '20 at 20:30
OK, I see: When both statistic and p-value are exactly identical, then the problem occurs. This looks like a bug to me. I'll forward this to Torsten as the main author of the code. – Achim Zeileis Jul 06 '20 at 23:55
Torsten fixed the bug and submitted a new version of `partykit` to CRAN. – Achim Zeileis Jul 08 '20 at 20:52
hi @AchimZeileis, thanks for the reply! was it a fix on the algorithm, i.e. on how to break a tie, or a fix on coding? – blueskyddd Jul 10 '20 at 16:24
The algorithm wasn't applied consistently so that the names($p.value) were wrong. – Achim Zeileis Jul 10 '20 at 22:21

score 0 · Answer 2 · answered Jul 04 '20 at 03:28

here is one example. Thank you!

library(partykit)
a=rep('N',87)
a[77]='Y'
b=rep(F,87)
b[c(7,10,11,33,56,77)]=T
d=rep(1,87)
d[c(29,38,40,42,65,77)]=0
dfb=data.frame(a=as.factor(a),b=as.factor(b),d=as.factor(d))
tFit=ctree(a ~ ., data=dfb, control = ctree_control(minsplit= 10,minbucket = 5,
                                                    maxsurrogate=2, alpha = 0.05))
plot(tFit) #displayed splitting variable is d
tNodes=node_party(tFit)
nodeInfo=info_node(tNodes)
names(nodeInfo$p.value) #b, not d

hi @AchimZeileis, here is the example! – blueskyddd Jul 04 '20 at 03:29 — blueskyddd, Jul 04 '20 at 03:29

R partykit::ctree() how to break tie in selecting splitting variable of identical p-value

2 Answers2