How is the MeanGiniDecrease for each feature calculated in randomForest package?

Asked Sep 11 '18 at 12:56

Active Sep 11 '18 at 13:17

Viewed 35 times

With my understanding that the Gini decrease can be calculated in a straightforward manner by subtracting the Gini impurity of child nodes from the parent node, how are all calculations aggregated per feature across the forest?

For example I have seen many MeanGiniDecrease graphs that show values of over 100 for some features. It seems unrealistic (or maybe it isn't??) that summing all decreases on nodes relevant to a given feature (all values between 0 and 1) for a given tree would produce such large numbers.

Any help would be greatly appreciated!

edited Sep 11 '18 at 13:17

asked Sep 11 '18 at 12:56

brucezepplin

9,202
26
76
129

1

https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#varimp – alexwhitworth Sep 11 '18 at 13:19
ah so from that link "Adding up the gini decreases for each individual variable over all trees in the forest gives a fast variable importance that is often very consistent with the permutation importance measure." So all nodes over all trees for a given feature are added? this makes more sense, but where does the average come in? it is called *Mean*DecreaseGini after all.... – brucezepplin Sep 11 '18 at 13:25

How is the MeanGiniDecrease for each feature calculated in randomForest package?

0 Answers0