2

I am doing the multi-core computing in R. I am

Here are the code and outputs for each of the computation. Why the elapsed time increases as the number of cores increases? This is really counter-intuitive. I think it is reasonable that the elapsed time decreases as the number of cores increases. Is there any way to fix this?

enter image description here

Here is the code:

library(parallel)
detectCores()
system.time(pvec(1:1e7, sqrt, mc.cores = 1))
system.time(pvec(1:1e7, sqrt, mc.cores = 4))
system.time(pvec(1:1e7, sqrt, mc.cores = 8))

Thank you.

Shijia Bian
  • 189
  • 6
  • Please, copy and paste the code so everybody can easily run it, instead of posting an image. – nicola Apr 08 '16 at 05:53
  • Thank you. I just added the code! – Shijia Bian Apr 08 '16 at 05:56
  • There is a fair amount of overhead when you call `pvec`. The input vector must be split in chunks and a new job for each chunk must be created. These operations take time. For fast and vectorized operations (like `sqrt`), this approach can actually be slower. See the source code of `pvec` to have a grasp of what's going on. – nicola Apr 08 '16 at 06:34
  • Thank you! @nicola – Shijia Bian Apr 15 '16 at 21:34

1 Answers1

0

Suppose that your data is divided into N parts. Each part of your data is calculated in T seconds. In a single core architecture you expect all operations will be done in N x T seconds. You also hope that all of the works should be done in T times in an N cores machine. However, in parallel computing, there is a communication lag, which is consumed by each single core (Initializing, passing data from main to child, calculations, passing result and finalizing). Now let the communication lag is C seconds and for simplicity, it is constant for all cores. So, in an N cores machine, calculations should be done in

T + N x C

seconds in which the T part is for calculations and N X C part is for total communications. If we compare it to single core machine, the inequality

(N x T) > (T + N x C)

should be satisfied to gain a computation time, at least, for our assumptions. If we simplify the inequality we can get

C < (N x T - T) / N

so, if the constant communication time is not less than the ratio (N x T - T) / N we have no gain to make this computation parallel.

In your example, the time needed for creation, calculation and communication is bigger than the single core computation for function sqrt.

jbytecode
  • 681
  • 12
  • 29