4

I'm using R multicore on a long list. I invoke mclapply on the list, which makes use of 12 cores on my machine.

When my list has about 1000 elements long it runs fine. When my list is longer than ~2000 elements (I'm not sure at what length this behaviour kicks in) then all jobs submitted to core 5 fail.

(I found this out by submitting the list element ids to this website.)

I have tried this on several nodes but I always get the following warning:

Warning message:
In mclapply(h.list, train_and_predict, learn.bias = F, ntree = ntree,  :
  scheduled core 5 encountered error in user code, all values of the job will be affected

Q: Why would only one core fail?

Any help will be greatly appreciated.

PK

polarise
  • 2,303
  • 1
  • 19
  • 28

1 Answers1

2

I think this happens when at least one datum activates a bug in your code. multicore cannot recover and all the data on that core is corrupted. mclapply partitions the data evenly across all nodes and if it fails for even one datum on one node.

I would suggest the following solution: start with M of N data items; if it fails then gradually decrease M until it works; then the M+1 datum is faulty. Run your code serially (that is, without using mclapply) with the M+1-th datum only and see where it fails. Then that is the bug.

polarise
  • 2,303
  • 1
  • 19
  • 28