4

I have trained some data with rpart and interested in labeling each observation with the tree terminal node, and link to the rule corresponding to that terminal node.

I have used the following code as example:

library(rpart)
library(rattle)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
table(fit$where)
rattle::asRules(fit)

I'm able to label each observation via fit$where, the labels are:

> table(fit$where)

 3  5  7  8  9 
29 12 14  7 19 

first question: these labels does not correspond with the labels generated by rattle::asRules(fit), which are 3,23,22,10,4 how can I generate the mapping table between the two?

second question: asRules just prints while I would like to put the rules in a table and not standard output.

my expected results: a data frame with a mapping between fit$where and asRules labels and another column with the rule text as a string, e.g.:

 Rule number: 4 [Kyphosis=absent cover=29 (36%) prob=0.00]
   Start>=8.5
   Start>=14.5

if we can parse the text to ID, statistics and condition in separate columns, even better but not mandatory.

I have found many related questions and links, but did not find a final answer.

thanks much, Kamashay

progress update 29/01

I'm able to extract each rule separately if I have the rule ID, via path.rpart:

>path.rpart(fit,node=22) 

 node number: 22 
   root
   Start>=8.5
   Start< 14.5
   Age>=55
   Age>=111

this gets me the rule as a list I can convert to a string. however the IDs are complaint with 'asRules' function and not 'fit$where'...

using "partykit" gets me the same results as "fit$where":

library("partykit")
> table(predict(as.party(fit), type = "node"))

 3  5  7  8  9 
29 12 14  7 19 

so, I'm still not able to link between the two ( asRules IDs and fit$where IDs), I'm probably missing something fundamental, or there's a more straightforward way to do the task.

can you aid?

kamashay
  • 93
  • 1
  • 9

4 Answers4

4

You can find the rule number (in fact the leaf node number) corresponding to each fit$where using

> row.names(fit$frame)[fit$where]
 [1] "3"  "22" "3"  "3"  "4"  "4"  ...

You might get a little closer to your desired output with

> rattle::asRules(fit, TRUE)
R  3 [23%,0.58] Start< 8.5
R 23 [ 9%,0.57] Start>=8.5 Start< 14.5 Age>=55 Age< 111
...
Graham Williams
  • 556
  • 2
  • 10
4

Did you mean something like this?

library(rpart)
library(rpart.utils)
library(dplyr)

#model
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)

#dataframe having leaf node's rule and subrule combination
rule_df <- rpart.rules.table(fit) %>%
  filter(Leaf==TRUE) %>%
  group_by(Rule) %>%
  summarise(Subrules = paste(Subrule, collapse=","))

#final dataframe
df <- kyphosis %>%
  mutate(Rule = row.names(fit$frame)[fit$where]) %>%
  left_join(rule_df, by="Rule")
head(df)

#subrule table
rpart.subrules.table(fit)

Output is:

  Kyphosis Age Number Start Rule    Subrules
1   absent  71      3     5    3          R1
2   absent 158      3    14   22 L1,R2,R3,L4
3  present 128      4     5    3          R1
4   absent   2      5     1    3          R1
5   absent   1      4    15    4       L1,L2
6   absent   1      2    16    4       L1,L2

Subrule definition:

  Subrule Variable Value Less Greater
1      L1    Start   8.5 <NA>     8.5
2      L2    Start  14.5 <NA>    14.5
3      L3      Age  <NA>   55    <NA>
4      L4      Age   111 <NA>     111
5      R1    Start  <NA>  8.5    <NA>
6      R2    Start  <NA> 14.5    <NA>
7      R3      Age    55 <NA>      55
8      R4      Age  <NA>  111    <NA>
Prem
  • 11,775
  • 1
  • 19
  • 33
1

You can get the number of rules (leaves) in this way:

nrules <- as.integer(rownames(fit$frame[fit$frame$var == "<leaf>",]))

You can also iterate for the rules like this:

rules <- lapply(nrules, path.rpart, tree=fit, pretty=0, print.it=FALSE)

Another alternative is using the package rpart.plot

rules <- rpart.plot::rpart.rules(model, cover=T, nn=T)

Adriano Rivolli
  • 2,048
  • 1
  • 13
  • 13
0

for what this worth, here is what I used after all:

[1] for alignment of labels between fit$where and asRules I used the solution by @Graham Williams, or get the labels right in the first place by adopting function from @VitoshKa: https://stackoverflow.com/a/30088268/8263160

[2] for creating a list of nicely formatted rules in a data frame I adopted and modified the parse_tree function by Tomáš Greif: https://www.r-bloggers.com/create-sql-rules-from-rpart-model/

kamashay
  • 93
  • 1
  • 9