4

Trying to use foreach to plot the partial dependence plot from the randomForest package. Getting error:

Error in { : task 1 failed - undefined columns selected

 library(randomForest)
 library(DoParallel)
 library(mlbench)
 data(Sonar)
 registerDoParallel(8,cores=8)

Sonar.rrf <- randomForest(
Sonar[-61],
Sonar[[61]],
ntree = 101,
oob.prox = FALSE,
importance = TRUE
)

Sonarimp <- importance(Sonar.rrf)

m.list <- foreach(
i = 1:10,
.combine = 'partialPlot',
.init = NULL,
.multicombine = TRUE,
.inorder = FALSE,
.packages = 'randomForest'
) %dopar%
{
impvar <- rownames(Sonarimp)[order(Sonarimp[, 1], decreasing = TRUE)]
imptvar <- impvar[i]
partialPlot(
  x = Sonar.rrf,
  pred.data = Sonar,
  x.var = imptvar,
  which.class = "R",
  xlab = imptvar,
  main = paste("Partial Dependence on", imptvar),
  ylim = c(30, 70)
)
}
Scott
  • 642
  • 7
  • 16
  • In your call to `randomForest` what do `iris[-1]` and `iris[[1]` represent? I know that this does in R, but are you trying to tell `randomForest` to predict `Sepal.Length` with the rest of the columns in `iris`? – Bryan Goggin Jun 17 '16 at 18:24
  • Sorry, I took the example from randomForest help. It's probably not the right example, given I am using a classification tree. I will edit question. Ty. – Scott Jun 17 '16 at 18:31
  • Sorry for all the edits. Should be correct now. – Scott Jun 17 '16 at 18:39
  • Is it necessary to do the plots in parallel? I think that is the snag. It works fine with a regular `for` loop. – Bryan Goggin Jun 17 '16 at 19:10
  • Hi @BryanGoggin. Yea, partialPlot takes a long time on large data sets (see randomForest help) so I was trying to figure out a way to speed up the process. Appreciate you looking into it. – Scott Jun 17 '16 at 19:40

0 Answers0