All jobs on one core fail with R multicore

Question

I'm using R multicore on a long list. I invoke mclapply on the list, which makes use of 12 cores on my machine.

When my list has about 1000 elements long it runs fine. When my list is longer than ~2000 elements (I'm not sure at what length this behaviour kicks in) then all jobs submitted to core 5 fail.

(I found this out by submitting the list element ids to this website.)

I have tried this on several nodes but I always get the following warning:

Warning message:
In mclapply(h.list, train_and_predict, learn.bias = F, ntree = ntree,  :
  scheduled core 5 encountered error in user code, all values of the job will be affected

Q: Why would only one core fail?

Any help will be greatly appreciated.

PK

Please provide a reproducible example which reproduces this issue. — Paul Hiemstra, Feb 01 '13 at 20:07
When this happened to me it was always because some element of the full list was breaking my code -- while none on the shorter list did. — Ryogi, Feb 01 '13 at 20:43
@Paul Hiemstra: The data I used is quite big (~100MiB). Would you be able to grab that? — polarise, Feb 01 '13 at 22:05
You can provide us with a link to the file, without such an example it is hard to go beyond speculation as to why this happens, and how to fix it. — Paul Hiemstra, Feb 02 '13 at 08:15

score 2 · Answer 1 · answered Aug 13 '14 at 09:20

I think this happens when at least one datum activates a bug in your code. multicore cannot recover and all the data on that core is corrupted. mclapply partitions the data evenly across all nodes and if it fails for even one datum on one node.

I would suggest the following solution: start with M of N data items; if it fails then gradually decrease M until it works; then the M+1 datum is faulty. Run your code serially (that is, without using mclapply) with the M+1-th datum only and see where it fails. Then that is the bug.

All jobs on one core fail with R multicore

1 Answers1