2

I would like a way to turn an rpart tree object into a nested list of lists (a dendrogram). Ideally, the attributes in each node will include the information in the rpart object (impurity, variable and rule that is used for splitting, the number of observations funneled to that node, etc.).

Looking at the rpart$frame object, it is not clear to me how to read it. Any suggestions?

Tiny example:

library(rpart)
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
fit$frame
      var  n wt dev yval complexity ncompete nsurrogate    yval2.V1    yval2.V2    yval2.V3    yval2.V4    yval2.V5 yval2.nodeprob
1   Start 81 81  17    1 0.17647059        2          1  1.00000000 64.00000000 17.00000000  0.79012346  0.20987654     1.00000000
2   Start 62 62   6    1 0.01960784        2          2  1.00000000 56.00000000  6.00000000  0.90322581  0.09677419     0.76543210
4  <leaf> 29 29   0    1 0.01000000        0          0  1.00000000 29.00000000  0.00000000  1.00000000  0.00000000     0.35802469
5     Age 33 33   6    1 0.01960784        2          2  1.00000000 27.00000000  6.00000000  0.81818182  0.18181818     0.40740741
10 <leaf> 12 12   0    1 0.01000000        0          0  1.00000000 12.00000000  0.00000000  1.00000000  0.00000000     0.14814815
11    Age 21 21   6    1 0.01960784        2          0  1.00000000 15.00000000  6.00000000  0.71428571  0.28571429     0.25925926
22 <leaf> 14 14   2    1 0.01000000        0          0  1.00000000 12.00000000  2.00000000  0.85714286  0.14285714     0.17283951
23 <leaf>  7  7   3    2 0.01000000        0          0  2.00000000  3.00000000  4.00000000  0.42857143  0.57142857     0.08641975
3  <leaf> 19 19   8    2 0.01000000        0          0  2.00000000  8.00000000 11.00000000  0.42105263  0.57894737     0.23456790

(the function ggdendro:::dendro_data.rpart might be helpful somehow, but I couldn't get it to really solve the problem)

Tal Galili
  • 24,605
  • 44
  • 129
  • 187
  • the function `ggdendro:::dendro_data.rpart` needs `model` parameter that has to be an object of class `tree` but `rpart(...)` returns `rpart` object, not tree and may be both acts as different class, see [this](https://stat.ethz.ch/pipermail/r-help/2005-May/070922.html) – parth Jul 11 '17 at 10:25

1 Answers1

1

Here is a GitHub gist with the function rpart2dendro for converting an object of class "rpart" to a dendrogram. Note that branches are not weighted in the output object, but it should be fairly straightforward to recursively modify the "height" attributes of the dendrogram to get proportional branch lengths. The Kyphosis example is provided at the bottom.

Shaun Wilkinson
  • 473
  • 1
  • 4
  • 11