1

With my understanding that the Gini decrease can be calculated in a straightforward manner by subtracting the Gini impurity of child nodes from the parent node, how are all calculations aggregated per feature across the forest?

For example I have seen many MeanGiniDecrease graphs that show values of over 100 for some features. It seems unrealistic (or maybe it isn't??) that summing all decreases on nodes relevant to a given feature (all values between 0 and 1) for a given tree would produce such large numbers.

Any help would be greatly appreciated!

brucezepplin
  • 9,202
  • 26
  • 76
  • 129
  • 1
    https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#varimp – alexwhitworth Sep 11 '18 at 13:19
  • ah so from that link "Adding up the gini decreases for each individual variable over all trees in the forest gives a fast variable importance that is often very consistent with the permutation importance measure." So all nodes over all trees for a given feature are added? this makes more sense, but where does the average come in? it is called *Mean*DecreaseGini after all.... – brucezepplin Sep 11 '18 at 13:25

0 Answers0