2

It is difficult to debug the error code of mclapply because all values of a job are affected.

I prepared a simple example.

library(parallel)
library(dplyr)

data(iris)

## Parallel Version
parFun <- function(i){
  print(i)
  ## Generate a random subset of the iris data set
  daf <- iris[sample(1:nrow(iris),10),]
  
  ## Bug in iteration number of 39, some internal function returned NULL
  if(i == 39){
    daf <- NULL
  }
  
  ## Dplyr produces an error, needs an if test for NULL
  res <- daf %>% group_by("Species") %>% slice_min(order_by = Petal.Width, n = 2)
  
  return(res)
}

## Do the call which returns error code
## Scheduled core 3 encountered error in user code, all values of the job will be affected
resList <- mclapply(1:50,parFun,mc.cores=12)
idx <- sapply(resList,function(x){is.null(nrow(x))})

## Depending on the number of cores a sequence of jobs is affected
which(idx == TRUE)

How to debug such code for several 1000 iterations ? How to find the single i that causes the error ?

Peter Pisher
  • 457
  • 2
  • 11
  • Did you load the libraries on the cluster before executing the code? What is the error? – mhovd Jun 29 '21 at 09:23
  • My question is not about the error, the error is "## Scheduled core 3 encountered error in user code, all values of the job will be affected" which is caused by a missing if statement before the dplyr statement. My question is how to find iteration i which causes the error ? In this case the answer would be i = 39, but in other larger scripts this is not easy to find – Peter Pisher Jun 30 '21 at 07:19
  • Sorry, I misunderstood the question, and now I see what you are really asking. I will try to think of a solution. – mhovd Jun 30 '21 at 09:00

0 Answers0