Poor speed gain in using `future` for parallelization

Question

I find that the speed gain in using the future (and furrr) package for parallelization in R is not satisfactory. In particular, the speed improvement is not close to linear. My machine has 4 workers, so I thought the speed gain should be around linear when the number of workers I specify is not larger than the number of cores available in my machine. However, it is not the case.

The following is an example that illustrates the problem, where I draw 10^7 random numbers for 500 times.

library(future)
library(furrr)

# Parameters
n <- 1e7
m <- 500

# Compute the mean
rmean <- function(x, n) {
  rand.vec <- runif(n)
  rand.mean <- mean(rand.vec)
  return(rand.mean)
}

# Record the time used to compute the mean of n numbers for m times
rtime <- function(m, n) {
  t1 <- Sys.time()
  temp <- future_map(.x = 1:m,
                     .f = rmean,
                     n = n,
                     .options = furrr::furrr_options(seed = TRUE))
  t2 <- Sys.time()
  # Print the time used
  print(t2 - t1)
  return(temp)
}

# Print the time used for different number of workers 
plan(multisession, workers = 1)
set.seed(1)
x <- rtime(m, n)
# Time difference of 2.503885 mins

plan(multisession, workers = 2)
set.seed(1)
x <- rtime(m, n)
# Time difference of 1.341357 mins

plan(multisession, workers = 3)
set.seed(1)
x <- rtime(m, n)
# Time difference of 57.25641 secs

plan(multisession, workers = 4)
set.seed(1)
x <- rtime(m, n)
# Time difference of 47.31929 secs

In the above example, the speed gain that I get are:

1.87x for 2 workers
2.62x for 3 workers
3.17x for 4 workers

The speed gain in the above example is not close to linear, especially when I use 4 workers. I thought this might be because of the overhead time from the plan function. However, the speed gain is similar if I run the procedure multiple times after setting the number of workers. This is illustrated as follows:

plan(multisession, workers = 3)
set.seed(1)
x <- rtime(m, n)
# Time difference of 58.07243 secs
set.seed(1)
x <- rtime(m, n)
# Time difference of 1.012799 mins
set.seed(1)
x <- rtime(m, n)
# Time difference of 57.96777 secs

I also tried to use the future_lapply function from the future.apply package instead of the future_map function from the furrr package. However, their speed gain is similar as well. Therefore, I would appreciate any advice on what is going on here. Thank you!

I get {1.92 mins, 59.5 secs, 40.3 secs, 30.3 secs} or relative speeds {1.93, 2.85, 3.8}. Memory allocation time? — Ben Bolker, Jan 24 '21 at 00:09
Hi @BenBolker, it looks like you have a much better speed gain than mine! Do you have any suggestions for reducing the memory allocation time? Or would you think that it is my machine's issue? For your info, I am using a Mac with 16GB memory and 2.8 GHz Quad-Core processor. Thanks in advance! — rick, Jan 24 '21 at 06:02
The speed gains are quite good. It is very rare to get exactly the number of cores. — F. Privé, Jan 24 '21 at 07:14
I understand it is difficult to be exactly linear, but the speed gain that I get for 4 workers is quite far from linear. The speed gain that @BenBolker has looks much better than mine. The 0.63x difference in the 4-worker case does not seem to be negligible, especially for large-scale projects. So, I would appreciate any advice on what I can do to get a better speed up and why my speed gain is smaller when we run the same script. Thanks. — rick, Jan 26 '21 at 06:12
on some computers, setting `furrr_options(scheduling = 2)` gives a little bit of performance gain. — Agile Bean, May 31 '21 at 13:40

Poor speed gain in using `future` for parallelization

0 Answers0