I have used the partykit package to create a Model-Based Partitioning (MOB) tree on a dataset and I wondered if there was a way to see which observations in our dataset pass that rules and fall in each node. (I want to have separate data frame for each node base on the tree rules).
Asked
Active
Viewed 791 times
1 Answers
2
You can use predict(..., type = "node")
for all partykit
trees to obtain the predicted terminal node ID. And then you can use that for split()
ting the data set. For example:
library("partykit")
tr <- lmtree(dist ~ speed, data = cars)
plot(tr)
split(cars, predict(tr, type = "node"))
## $`3`
## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
## 4 7 22
## 5 8 16
## 6 9 10
## 7 10 18
## 8 10 26
## 9 10 34
## 10 11 17
## 11 11 28
## 12 12 14
## 13 12 20
## 14 12 24
## 15 12 28
##
## $`4`
## speed dist
## 16 13 26
## 17 13 34
## 18 13 34
## 19 13 46
## 20 14 26
## 21 14 36
## 22 14 60
## 23 14 80
## 24 15 20
## 25 15 26
## 26 15 54
## 27 16 32
## 28 16 40
## 29 17 32
## 30 17 40
## 31 17 50
##
## $`5`
## speed dist
## 32 18 42
## 33 18 56
## 34 18 76
## 35 18 84
## 36 19 36
## 37 19 46
## 38 19 68
## 39 20 32
## 40 20 48
## 41 20 52
## 42 20 56
## 43 20 64
## 44 22 66
## 45 23 54
## 46 24 70
## 47 24 92
## 48 24 93
## 49 24 120
## 50 25 85

Achim Zeileis
- 15,710
- 1
- 39
- 49
-
Is there a way to get the sample size (n) of all nodes in the tree? – Lefkios Paikousis Dec 20 '18 at 21:06
-
If you only want to get the sample size, then this is stored in the `info` of each node as `nobs`. Thus, you can do: `nodeapply(tr, nodeids(tr), function(x) info_node(x)$nobs)`. To obtain the full data associated with a particular node, use fro example: `data_party(tr, id = 2)`. – Achim Zeileis Dec 20 '18 at 23:21
-
Much appreciated! Thanks – Lefkios Paikousis Dec 20 '18 at 23:42