2

I have a set of genes for which I need to calculate some coefficients in parallel. Coefficients are calculated inside GeneTo_GeneCoeffs_filtered that takes gene name as an input and returns the list of 2 data frames.

Having 100-length gene_array I ran this command with the different number of cores: 5, 6 and 7.

Coeffslist=mclapply(gene_array,GeneTo_GeneCoeffs_filtered,mc.cores = no_cores)

I encounter errors on different gene names depending on the number of cores assigned to mclapply.

Indexes of genes on which GeneTo_GeneCoeffs_filtered cannot return the list of data frames they have a pattern. In the case of 7 cores assigned to mclapply, it is 4, 11, 18, 25, ... 95 elements of gene_array (every 7th), and when R works with 6 cores indexes are 2, 8, 14,..., 98 (every 6th) and the same way with 5 cores - every 5th.

The most important thing is that they are different for these processes and it means that the problem is not in particular genes.

I suspect there might be "broken" core that cannot properly run my functions and only it generates this errors. Is there a way to trace back its id and exclude it from the list of cores that can be used by R?

lizaveta
  • 353
  • 1
  • 13
  • Without checking, my gut feeling is that it has to do with chunking of `gene_array` - the function `GeneTo_GeneCoeffs_filtered(x)` will be called with differently `x` chunks depending on the number of chunks, i.e. the value of `ncores`. It could be that one of the chunks contain, say, all missing values. Try to produce `chunks <- mcapply(gene_array, identity, mc.cores = ncores)` and the call each manually with `y <- GeneTo_GeneCoeffs_filtered(chunks[[1]])` etc. to see if one of the chunks is "problematic". (I doubt there's a CPU hardware problem; you'd notice without running R) – HenrikB Oct 11 '18 at 19:51
  • Hi @lizaveta - I am wondering if you ever figured this out for yourself. I am having identical issue - every kth result is an error when the number of cores is set to k. Very strange experience. A question for you: does the function you are applying, GeneTo_GeneCoeffs_filtered, perform any IO to disk or screen? – malcook Sep 15 '19 at 03:13
  • Hi @malcook, I was keeping results in memory without writing them on disk inside the function. However, I can't properly recall what was the reason for such a weird behavior - I switched to using a different way of parallelizing the calculation. If I am not mistaken, the problem was in the environments that I had to export to a core – lizaveta Sep 16 '19 at 10:23

1 Answers1

1

A close reading of mclapply's manpage reveals that this behavior is by design and it arises as result of interaction between:

(a)

"the input X is split into as many parts as there are cores (currently the values are spread across the cores sequentially, i.e. first value to core 1, second to core 2, ... (core + 1)-th value to core 1 etc.) and then one process is forked to each core and the results are collected."

(b)

a "try-error" object will be returned for all the values involved in the failure, even if not all of them failed.

In your case, by virtue of (a), your gene_array is spread "round-robin" style across the cores (with a gap of mc.cores between the indexes of successive elements), and by virtue of (b), if any gene_array element raises an error, you get back an error for each gene_array element sent to that core (having a gap of mc.cores between the indices of those elements).

I refreshed my understanding of this in an exchange yesterday with Simon Urbanek: https://stat.ethz.ch/pipermail/r-sig-hpc/2019-September/002098.html in which I also provide an error-handling approach yielding errors only for the indices that generate an error.

You can also get errors only for the indices that generate an error by passing mc.preschedule=FALSE.

malcook
  • 1,686
  • 16
  • 16