Why the elapsed time increases while the number of core increases?

Question

I am doing the multi-core computing in R. I am

Here are the code and outputs for each of the computation. Why the elapsed time increases as the number of cores increases? This is really counter-intuitive. I think it is reasonable that the elapsed time decreases as the number of cores increases. Is there any way to fix this?

Here is the code:

library(parallel)
detectCores()
system.time(pvec(1:1e7, sqrt, mc.cores = 1))
system.time(pvec(1:1e7, sqrt, mc.cores = 4))
system.time(pvec(1:1e7, sqrt, mc.cores = 8))

Thank you.

Please, copy and paste the code so everybody can easily run it, instead of posting an image. — nicola, Apr 08 '16 at 05:53
There is a fair amount of overhead when you call `pvec`. The input vector must be split in chunks and a new job for each chunk must be created. These operations take time. For fast and vectorized operations (like `sqrt`), this approach can actually be slower. See the source code of `pvec` to have a grasp of what's going on. — nicola, Apr 08 '16 at 06:34

jbytecode · Accepted Answer · 2016-04-16T23:13:52.160

Suppose that your data is divided into N parts. Each part of your data is calculated in T seconds. In a single core architecture you expect all operations will be done in N x T seconds. You also hope that all of the works should be done in T times in an N cores machine. However, in parallel computing, there is a communication lag, which is consumed by each single core (Initializing, passing data from main to child, calculations, passing result and finalizing). Now let the communication lag is C seconds and for simplicity, it is constant for all cores. So, in an N cores machine, calculations should be done in

T + N x C

seconds in which the T part is for calculations and N X C part is for total communications. If we compare it to single core machine, the inequality

(N x T) > (T + N x C)

should be satisfied to gain a computation time, at least, for our assumptions. If we simplify the inequality we can get

C < (N x T - T) / N

so, if the constant communication time is not less than the ratio (N x T - T) / N we have no gain to make this computation parallel.

In your example, the time needed for creation, calculation and communication is bigger than the single core computation for function sqrt.

Why the elapsed time increases while the number of core increases?

1 Answers1