0

I have a decision tree output in a 'text' format which is very hard to read and interpret. There are ton of pipes and indentation to follow the tree/nodes/leaf. I was wondering if there are tools out there where I can feed in a decision tree like below and get a tree diagram like Weka, Python, ...etc does?

Since my decision tree is very large, below is the sample/partial decision to give an idea of my text decision tree. Thanks a bunch!

"bio" <= 0.5:
|    "ml" <= 0.5:
|    |    "algorithm" <= 0.5:
|    |    |    "bioscience" <= 0.5:
|    |    |    |    "microbial" <= 0.5:
|    |    |    |    |    "assembly" <= 0.5:
|    |    |    |    |    |    "nano-tech" <= 0.5:
|    |    |    |    |    |    |    "smith" <= 0.5:
|    |    |    |    |    |    |    |    "neurons" <= 0.5:
|    |    |    |    |    |    |    |    |    "process" <= 1.5:
|    |    |    |    |    |    |    |    |    |    "program" <= 1.5:
|    |    |    |    |    |    |    |    |    |    |    "mammal" <= 1.0:
|    |    |    |    |    |    |    |    |    |    |    |    "lab" <= 0.5:
|    |    |    |    |    |    |    |    |    |    |    |    |    "human-machine" <= 1.5:
|    |    |    |    |    |    |    |    |    |    |    |    |    |    "tech" <= 0.5:
|    |    |    |    |    |    |    |    |    |    |    |    |    |    |    "smith" <= 0.5:
sharp
  • 2,140
  • 9
  • 43
  • 80

1 Answers1

1

I'm not aware of any tool to interpret that format so I think you're going to have to write something, either to interpret the text format or to retrieve the tree structure using the DecisionTree class in MALLET's Java API.

Interpreting the text in Python shouldn't be too hard: for example, if

line = '|    |    |    |    |    "assembly" <= 0.5:'

then you can get the indent level, the predictor name and the split point with

parts = line.split('"')
indent = parts[0].count('|    ')
predictor = parts[1]
splitpoint = float(parts[2][-1-parts[2].rfind(' '):-1])

To create graphical output, I would use GraphViz. There are Python APIs for it, but it's simple enough to build a file in its text-based dot format and create a graphic from it with the dot command. For example, the file for a simple tree might look like

digraph MyTree {
Node_1 [label="Predictor1"]
Node_1 -> Node_2 [label="< 0.335"]
Node_1 -> Node_3 [label=">= 0.335"]
Node_2 [label="Predictor2"]
Node_2 -> Node_4 [label="< 1.42"]
Node_2 -> Node_5 [label=">= 1.42"]
Node_3 [label="Class1
(p=0.897, n=26)", shape=box,style=filled,color=lightgray]
Node_4 [label="Class2
(p=0.993, n=17)", shape=box,style=filled,color=lightgray]
Node_5 [label="Class3
(p=0.762, n=33)", shape=box,style=filled,color=lightgray]
}

and the resulting output from dot

graphviz tree output

nekomatic
  • 5,988
  • 1
  • 20
  • 27