0

I want to run a Boruta algorithm that uses the importance of a random forest made with the ranger function. However, when using the code below, I get the error "Error in getImp(cbind(x[, decReg != "Rejected"], xSha), y, ...): could not find function "getImp""

If I run the code without the getImp argument, it runs fine, but in that case it uses a default value for getImp which is not what I prefer. How can I pass the importance from my custom ranger function correctly to the Boruta function? btw, ChatGPT can't fix it ;-)

Documentation from R help: getImp
the function used to obtain attribute importance. The default is getImpRfZ, which runs random forest from the ranger package and gathers Z-scores of mean decrease accuracy measure. It should return a numeric vector of a size identical to the number of columns of its first argument, containing an important measure of respective attributes. Any order-preserving transformation of this measure will yield the same result. It is assumed that more important attributes get higher importance. +-Inf are accepted, NaNs and NAs are treated as 0s, with a warning.

rf_ranger <- ranger::ranger(group ~ .,data = dat,
                            num.trees=10000,
                            splitrule='extratrees',
                            min.node.size=1,
                            importance = 'impurity',
                            mtry = 2)



ranger_imp <- rf_ranger$variable.importance
matrix_ranger_importance <- as.matrix(ranger_imp)
colnames(matrix_ranger_importance) <- "MeanDecreaseGini"


boruta.model <- Boruta(group ~ .,   #outcome & predictors
                       data = inputdata,
                       pValue = 0.01,
                       doTrace = 2,  # verbosity level
                       maxRuns = 100,
                       getImp = matrix_ranger_importance)

Sample data:

dat <- data.frame(group = sample(factor(c("active", "control")), 10, replace = TRUE),
                 v1 = sample(c(0,1),10, replace = TRUE),
                 v2 = sample(c(0,1),10, replace = TRUE),
                 v3 = sample(c(0,1),10, replace = TRUE),
                 v4 = sample(c(0,1),10, replace = TRUE),
                 v5 = sample(c(0,1),10, replace = TRUE))
Joep_S
  • 481
  • 4
  • 22
  • 1
    `getImp` has to be a function according to documentation. You might need to do your own function with arguments similar to `getImpRfZ` function. – Clemsang Apr 25 '23 at 12:48
  • I did that, but now I get the error: Error in eval(f[[2]], envir = data, enclos = env) : numeric 'envir' arg not of length one – Joep_S Apr 25 '23 at 13:23
  • Have you put the argument `...`: "parameters passed to the underlying ranger call; they are relayed from ... of Boruta." ? Please update the question with your new try. – Clemsang Apr 25 '23 at 13:27

1 Answers1

0

As mentioned by @Clemsang in the comment the Boruta argument getImp should be a function. The default is getImpRfZ, which runs random forest from the ranger package. You can see that the package implemented some adapter function like the case of getImpRfZ (https://cran.r-project.org/web/packages/Boruta/Boruta.pdf)

getImpRfZ(x, y, ntree = 500, num.trees = ntree, ...)

so you have to make the desired modifications ntree=10000, importance = 'impurity'... on the getImpRfZ function like below

library(Boruta)
set.seed(123)
dat <- data.frame(group = sample(factor(c("active", "control")), 10, replace = TRUE),
                  v1 = sample(c(0,1),10, replace = TRUE),
                  v2 = sample(c(0,1),10, replace = TRUE),
                  v3 = sample(c(0,1),10, replace = TRUE),
                  v4 = sample(c(0,1),10, replace = TRUE),
                  v5 = sample(c(0,1),10, replace = TRUE))

# the modification you want to make 
# splitrule='extratrees',
# min.node.size=1,
# importance = 'impurity',
# mtry = 2

f_modif= function (x, y, ntree = 10000, num.trees = ntree, ...) 
{
  
  x$shadow.Boruta.decision <- y
  ranger::ranger(data = x, dependent.variable.name = "shadow.Boruta.decision", 
                 num.trees = num.trees,  scale.permutation.importance = TRUE, 
                 write.forest = FALSE,
                 splitrule='extratrees',
                 min.node.size=1,
                 importance = 'impurity',
                 mtry = 2,
                 ...)$variable.importance
}


boruta.model <- Boruta(group ~ .,   #outcome & predictors
                       data = dat,
                       pValue = 0.01,
                       doTrace = 2,  # verbosity level
                       maxRuns = 100,
                       getImp = f_modif)
#>  1. run of importance source...
#>  2. run of importance source...
#> ...
print(boruta.model)
#> Boruta performed 26 iterations in 7.530128 secs.
#>  No attributes deemed important.
#>  5 attributes confirmed unimportant: v1, v2, v3, v4, v5;

Created on 2023-04-25 with reprex v2.0.2

Wael
  • 1,640
  • 1
  • 9
  • 20