mclapply encounters errors depending on core id?

Question

I have a set of genes for which I need to calculate some coefficients in parallel. Coefficients are calculated inside GeneTo_GeneCoeffs_filtered that takes gene name as an input and returns the list of 2 data frames.

Having 100-length gene_array I ran this command with the different number of cores: 5, 6 and 7.

Coeffslist=mclapply(gene_array,GeneTo_GeneCoeffs_filtered,mc.cores = no_cores)

I encounter errors on different gene names depending on the number of cores assigned to mclapply.

Indexes of genes on which GeneTo_GeneCoeffs_filtered cannot return the list of data frames they have a pattern. In the case of 7 cores assigned to mclapply, it is 4, 11, 18, 25, ... 95 elements of gene_array (every 7th), and when R works with 6 cores indexes are 2, 8, 14,..., 98 (every 6th) and the same way with 5 cores - every 5th.

The most important thing is that they are different for these processes and it means that the problem is not in particular genes.

I suspect there might be "broken" core that cannot properly run my functions and only it generates this errors. Is there a way to trace back its id and exclude it from the list of cores that can be used by R?

Without checking, my gut feeling is that it has to do with chunking of `gene_array` - the function `GeneTo_GeneCoeffs_filtered(x)` will be called with differently `x` chunks depending on the number of chunks, i.e. the value of `ncores`. It could be that one of the chunks contain, say, all missing values. Try to produce `chunks <- mcapply(gene_array, identity, mc.cores = ncores)` and the call each manually with `y <- GeneTo_GeneCoeffs_filtered(chunks[[1]])` etc. to see if one of the chunks is "problematic". (I doubt there's a CPU hardware problem; you'd notice without running R) — HenrikB, Oct 11 '18 at 19:51
Hi @lizaveta - I am wondering if you ever figured this out for yourself. I am having identical issue - every kth result is an error when the number of cores is set to k. Very strange experience. A question for you: does the function you are applying, GeneTo_GeneCoeffs_filtered, perform any IO to disk or screen? — malcook, Sep 15 '19 at 03:13
Hi @malcook, I was keeping results in memory without writing them on disk inside the function. However, I can't properly recall what was the reason for such a weird behavior - I switched to using a different way of parallelizing the calculation. If I am not mistaken, the problem was in the environments that I had to export to a core — lizaveta, Sep 16 '19 at 10:23

score 1 · Accepted Answer · answered Sep 17 '19 at 17:22

A close reading of mclapply's manpage reveals that this behavior is by design and it arises as result of interaction between:

(a)

"the input X is split into as many parts as there are cores (currently the values are spread across the cores sequentially, i.e. first value to core 1, second to core 2, ... (core + 1)-th value to core 1 etc.) and then one process is forked to each core and the results are collected."

(b)

a "try-error" object will be returned for all the values involved in the failure, even if not all of them failed.

In your case, by virtue of (a), your gene_array is spread "round-robin" style across the cores (with a gap of mc.cores between the indexes of successive elements), and by virtue of (b), if any gene_array element raises an error, you get back an error for each gene_array element sent to that core (having a gap of mc.cores between the indices of those elements).

I refreshed my understanding of this in an exchange yesterday with Simon Urbanek: https://stat.ethz.ch/pipermail/r-sig-hpc/2019-September/002098.html in which I also provide an error-handling approach yielding errors only for the indices that generate an error.

You can also get errors only for the indices that generate an error by passing mc.preschedule=FALSE.

This makes sense and I think it is exactly what happened back then. Thanks for explanations! — lizaveta, Sep 18 '19 at 09:34

mclapply encounters errors depending on core id?

1 Answers1

Linked