0

I am asking for help with this example. It might not be related to the pbapply package but rather to mclapply(). However, maybe someone has an idea whats going on here.

Problem

The only difference is the use of pblapply()/mclapply() instead of lapply() (I'm on macOS). In this example, the pblapply() approach returns NULL while the lapply() attempt works just fine.

The interesting point is that it only behaves like in this example in which I use lda(). Other examples using glm() or other modelling methods work fine.

It seems that the parallel apply call does not even start here (based on how quickly it finishes).

sperrorest() is a generic framework (and therefore should not be the problem here) and I am confused why the example works using the sequential approach (lapply()) but not with the parallel one (pblapply()).

Also, doing the same with foreach() instead of using an apply* function works fine. Hence, I assume that it is unrelated to lda() and related to the parallel apply* call.

Notes:

The example runs on two cores so it should run on any machine.

First, please do devtools::install_github("pat-s/sperrorest@mclapply-vs-lapply").

Code (reproducible)

library(MASS)
library(sperrorest)
library(parallel)
library(pbapply)

currentSample <- partition.cv(maipo, nfold = 4)
currentSample[[2]] <- partition.cv(maipo, nfold = 4)[[1]]
currentRes <- currentSample

lda.predfun <- function(object, newdata, fac = NULL) {
  library(nnet)
  majority <- function(x) {
    levels(x)[which.is.max(table(x))]
  }

  majority.filter <- function(x, fac) {
    for (lev in levels(fac)) {
      x[ fac == lev ] <- majority(x[ fac == lev ])
    }
    x
  }

  pred <- predict(object, newdata = newdata)$class
  if (!is.null(fac)) pred <- majority.filter(pred, newdata[,fac])
  return(pred)
}

data("maipo", package = "sperrorest")
predictors <- colnames(maipo)[5:ncol(maipo)]
fo <- as.formula(paste("croptype ~", paste(predictors, collapse = "+")))


# pblapply attempt (not working)
runreps_res <- pblapply(cl = 2, currentSample, function(X) 
  runreps(currentSample = X, data = maipo,
          formula = fo, par.mode = 1, pred.fun = lda.predfun,
          do.try = FALSE, model.fun = lda,
          error.fold = TRUE, error.rep = TRUE, do.gc = 1,
          err.train = TRUE, importance = FALSE, currentRes = currentRes, 
          pred.args = list(fac = "field"), response = "croptype", par.cl = 2, 
          coords = c("x", "y"), progress = 1, pooled.obs.train = c(), 
          pooled.obs.test = c(), err.fun = err.default))

# mclapply attempt (not working)
runreps_res <- mclapply(mc.cores = 2, currentSample, function(X) 
  runreps(currentSample = X, data = maipo,
          formula = fo, par.mode = 1, pred.fun = lda.predfun,
          do.try = FALSE, model.fun = lda,
          error.fold = TRUE, error.rep = TRUE, do.gc = 1,
          err.train = TRUE, importance = FALSE, currentRes = currentRes, 
          pred.args = list(fac = "field"), response = "croptype", par.cl = 2, 
          coords = c("x", "y"), progress = 1, pooled.obs.train = c(), 
          pooled.obs.test = c(), err.fun = err.default))

# lapply attempt (working)
runreps_res <- lapply(currentSample, function(X) 
  runreps(currentSample = X, data = maipo,
          formula = fo, par.mode = 1, pred.fun = lda.predfun,
          do.try = FALSE, model.fun = lda,
          error.fold = TRUE, error.rep = TRUE, do.gc = 1,
          err.train = TRUE, importance = FALSE, currentRes = currentRes, 
          pred.args = list(fac = "field"), response = "croptype", par.cl = 2, 
          coords = c("x", "y"), progress = 1, pooled.obs.train = c(), 
          pooled.obs.test = c(), err.fun = err.default))
pat-s
  • 5,992
  • 1
  • 32
  • 60
  • Works on Linux with R 3.4.0. Make sure to retry in a fresh R session, i.e. `R --vanilla`. FYI, if you look at the code for `pblapply()` you'll see that it internally calls `mclapply()` when you use `cl = 2`. In other words, troubleshooting `pblapply()` is the same as troubleshooting `mclapply()`. Alternatively, see if it works with `future_lapply()` of the future package when you use `plan(multiprocess)`. – HenrikB May 19 '17 at 04:22
  • 1
    Hi @HenrikB thanks. It works indeed on Linux and macOS with `R --vanilla` and on Windows but crashes on normal macOS R otherwise. Now I need to think how to deal with that issue. First checking your `future_lapply()` to see if that helps on macOS. I was considering using `pbmclapply()` anyways which uses `future` in the background :) – pat-s May 19 '17 at 11:36

0 Answers0