1

I created decision tree with Party package in R. I'm trying to get the route/branch with the maximum value.

It can be mean value that came from box-plot Picture 1

and it can be probability value that came from binary tree Picture 2
(source: rdatamining.com)

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
AsSAASA
  • 35
  • 8
  • Max value in the first tree will be in node 8. Max value in the second tree will be in node 5[2]. How we will recognize it automatically? – AsSAASA Feb 24 '16 at 12:25
  • iterate over the leaves and select what you are interested in? – lejlot Feb 24 '16 at 23:17
  • Yes. I want to know which decisions i need to do if i want the max mean (pic 1) – AsSAASA Feb 25 '16 at 06:57
  • You need to go over the recursive structure of the output object from party. Probably the easier is that you take [this question](http://stackoverflow.com/questions/25621611/converting-ctree-output-into-json-format-for-d3-tree-layout) as a starting point, and from the output format there you select the info you want. If you have more doubts, just post them in your question. – lrnzcig Feb 26 '16 at 09:41

1 Answers1

0

This can be done pretty easily actually, though while your definition of maximum value is clear for a regression tree, it is not very clear for a classification tree, as in each node different level can have it's own maximum

Either way, here's a pretty simple helper function that will return you the predictions for each type of tree

GetPredicts <- function(ct){
      f <- function(ct, i) nodes(ct, i)[[1]]$prediction
      Terminals <- unique(where(ct))
      Predictions <- sapply(Terminals, f, ct = ct)
      if(is.matrix(Predictions)){
        colnames(Predictions) <- Terminals
        return(Predictions)
       } else {
        return(setNames(Predictions, Terminals))
       }
}

Now luckily you've took your trees from the examples of ?ctree, so we can test them (next time, please provide the code you used yourself)


Regression Tree (your frist tree)

## load the package and create the tree
library(party)
airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq, 
               controls = ctree_control(maxsurrogate = 3))
plot(airct)

Now, test the function

res <- GetPredicts(airct)
res
#        5        3        6        9        8 
# 18.47917 55.60000 31.14286 48.71429 81.63333 

So we've got the predictions per each terminal node. You can easily proceed with which.max(res) from here (I'll leave it for you to decide)


Classification tree (your second tree)

irisct <- ctree(Species ~ .,data = iris)
plot(irisct, type = "simple")

Run the function

res <- GetPredicts(irisct)
res
#      2          5   6          7
# [1,] 1 0.00000000 0.0 0.00000000
# [2,] 0 0.97826087 0.5 0.02173913
# [3,] 0 0.02173913 0.5 0.97826087

Now, the output is a bit harder to read because each class has it's own probabilities. You could make this a bit more readable using

row.names(res) <- levels(iris$Species)
res
#            2          5   6          7
# setosa     1 0.00000000 0.0 0.00000000
# versicolor 0 0.97826087 0.5 0.02173913
# virginica  0 0.02173913 0.5 0.97826087

The, you could do something like the following in order to get the overall maximum value

which(res == max(res), arr.ind = TRUE)
#        row col
# setosa   1   1

For column/row maxes, you could do

matrixStats::colMaxs(res)
# [1] 1.0000000 0.9782609 0.5000000 0.9782609
matrixStats::rowMaxs(res)
# [1] 1.0000000 0.9782609 0.9782609

But, again, I'll leave to you to decide on how to proceed from here.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • Hello David, this is exactly what i meant. – AsSAASA Mar 23 '16 at 05:49
  • There is a way to pull out from the tree the route of the highest value? For example, in the regression tree I'll ask the route of node 8 (temp > 82, wind <10.3) Thanks!!! – AsSAASA Mar 23 '16 at 05:57