I'm currently working on a project in which I use Random Forest. I want to know the feature importance of all covariates and want to use MeanDecreaseGini
for this.
I really don't understand why there can be values greater than 0.5. The Gini index can't be greater than 0.5, so the decrease shouldn't be either. When you average over all the values in the nodes in the forest where a specific covariate was used, the mean decrease in Gini can't be greater than 0.5. Can anybody say, where my mistake in thinking is?
Here is an example for a code where the results for MeanDecreaseGini
are much greater than 0.5:
install.packages("randomForest")
library(randomForest)
set.seed(1)
a <- as.factor(c(rep(1, 20), rep(0, 30)))
b <- c(rnorm(20, 5, 2), rnorm(30, 4, 1))
c <- c(rnorm(25, 0, 1), rnorm(25, 1, 2))
data <- data.frame(a = a, b = b, c = c)
rf <- randomForest(data = data, a ~ b + c, importance = T, ntree = 300)
importance(rf)