0

What i am trying to do:

I want to display a tree.

What i have done:

I used ML.net to train a Decision Tree Model using the random forrest algorithm.

The random forrest algorithm results in multiple regression trees, which i want to display.

When i aquire the tree list i get them in the following format:

leaf_num=20

split_feature=818 315 1707 1866 829 315 958 1682 1669 934 1794 1708 1274 1876 219 1892 557 38 755

split_gain=53.6443 26.8314 26.0605 25.312 24.9156 23.8195 26.0324 20.622 19.2891 19.2309 18.7427 17.5442 17.5224 16.147 15.3396 14.6873 14.3071 18.8269 14.2999
threshold=-0.001959169982001185 -0.0063232299871742717 -0.018274550326168534 -0.023883100599050518 -0.0033916500397026535 -0.0086862500756978971 0.0057975498493760833 -0.024933900684118267 0.025956150144338611 0.032694099470973022 -0.02218380011618137 0.018020399846136573 0.029140749946236614 -0.034148199483752244 0.0075621248688548812 0.02500609960407019 0.0055113600101321944 0.012193250004202129 0.018237499520182613

left_child=3 4 14 11 12 15 10 -4 13 -9 -7 -1 -2 -6 -3 -5 -12 -18 -8
right_child=1 2 7 5 8 6 18 9 -10 -11 16 -13 -14 -15 -16 -17 17 -19 -20

leaf_value=0.74074074074073915 0.34995047870584339 0.22257551669316322 -0.78431372549019063 -0.61827023271969384 0.6299212598425189 -0.45209903121636041 -0.68965517241379315 0.55182171913689448 0.59405940594059359 -0.78212290502792992 0.36304961678096054 -0.58823529411764608 -0.65789473684210453 -0.42461317020510986 -0.71428571428571386 0.19801980198019786 -0.3720106288751106 0.78853046594981979 0.031796502384737663
leaf_count=29 121 25 8 115 10 37 59 226 16 7 99 8 12 111 20 16 45 11 25
internal_value=0 0.963751 1.81717 -0.900576 0.0592242 -1.18872 -0.521437 2.3387 -1.13736 2.55618 0.291424 2.2605 1.29168 -1.68372 -0.974314 -2.59225 0.902294 -0.712251 -2.37643
internal_count=1000 556 286 444 270 407 276 241 137 233 192 37 133 121 45 131 155 56 84
shrinkage=0.2

the containing class of ML.net is RegressionTree.cs

From my understanding left_child and right_child are LteChild and GtChild, all of these arrays have the length of the maximum node count without the leafs (so if there are 20 leafs the length is 19).

split_feature is the data column of the feature which is used.

I think the negative values are leafs.

My problem:

Unfortunatly i fail to construct a node based tree like this

     O
    / \
   O   O
  / \ / \
 L  L L  L

because i do not understand how the values of the array are ordered.

UPDATE:

As always, when you ask others for help it doesn't take long to get another promising idea yourself:

i dug deeper into the code of ML.net and i think i found a clue (line 856 for those who are interested).

My guess is that there has to be a variable as a counter such as int node

this counter starts on 0

to get the splitting column one needs to get SplitFeatures[node]

the child nodes would be LteChild[node] (LesserThanEqualChild) and GtChild[node] (GreaterThan)

and so on.

UPDATE2:

i think i got this right

Structured data:

split_feature=  818 315 1707 1866 829 315 958 1682 1669 934 1794 1708 1274 1876 219 1892 557 38  755

                0   1   2    3    4   5   6   7    8    9   10   11   12   13   14  15   16  17  18
left_child=     3   4   14   11   12  15  10 -4    13  -9  -7   -1   -2   -6   -3  -5   -12 -18 -8
right_child=    1   2   7    5    8   6   18  9   -10  -11  16  -13  -14  -15  -16 -17   17 -19 -20

once i programmed a solution i will post it as an answer.

Sebastian L
  • 838
  • 9
  • 29

1 Answers1

0

I posted the same question on the ML.net github page and got a good answer from Ivanidzo4ka of Microsoft.

The Link to the Issue: Link to github

Sebastian L
  • 838
  • 9
  • 29