2

I am using sfApply in R snowfall package for parallel computing. There are 32000 tests to run. The code is working fine when starting the computing, it will create 46 Rscript.exe processes and each Rscript.exe has a 2% cpu usage. The overall cpu usage is about 100% and the results are continually writing to disk. The computing will usually take tens of hours. The strange thing is that the Rscript.exe process becomes gradually inactive (cpu usage = 0) one by one, and the conresponding cpu is inactive too. After two days, there are only half number of Rscript.exe which are active by looking at the cpu usage, and overall cpu usage reduces to 50%. However, the work is far away to finish. As time goes by, more and more Rscript.exe go inactive, which makes the work last very very long. I am wondering what makes the process and cpu cores go inactive?

My computer has 46 logical cores. I am using R-3.4.0 from Rstudio in 64-bit windows 7. the following 'test' variable is 32000*2 matrix. myfunction is solving several differential equations.

Thanks.

    library(snowfall)
    sfInit(parallel=TRUE, cpus=46)
    Sys.time()
    sfLibrary(deSolve)
    sfExport("myfunction","test")
    res<-sfApply(test,1,function(x){myfunction(x[1],x[2])})
    sfStop()
    Sys.time()
yan
  • 21
  • 1
  • What about memory usage? Is enough RAM available? There's not much to go by here, but you could try running only a few tasks at a time and see if they pass. Start increasing the number of tasks until you hit the bottleneck. – Roman Luštrik Jun 16 '17 at 05:37
  • Thanks. The RAM is available, only 10G (64G total) is used. I could try that, but the problem is the processes are gradually inactive. The tasks are continuing, just with less and less cpus. It is like something during computing makes the cores sleep one by one. – yan Jun 16 '17 at 07:57
  • Sorry, I'm out of ideas. Perhaps you could use another parallel tool, like `parallel` or `foreach`? – Roman Luštrik Jun 16 '17 at 08:09
  • Some errors can kill a core. Also, you should check that each iteration actually completes in a reasonable time. I often have data that seems balanced initially, but operations on the data are actually very unbalanced. – CPak Jun 16 '17 at 22:12
  • 1
    Thanks. Exactly as you mentioned. After some digging, it should be because of unbalanced time each job needs. I have jobs that are more time-consuming in the later part of task queue. I think the sfApply firstly splits tasks by the cpu number in order and assigns tasks to each cpu, which results in an unbalanced finishing time for each cpu. My solution is using mclapply instead in Linux because mclapply seems not supporting forking in Windows. It has a random assignment or dynamic assignment, which will make my computing faster. Thanks again. – yan Jun 17 '17 at 09:06

1 Answers1

2

What you're describing sounds reasonable since snowfall::sfApply() uses snow::parApply() internally, which chunks up your data (test) into (here) 46 chunks and sends each chunk out to one of the 46 R workers. When a worker finishes its chunk, there is no more work for it and it'll just sit idle while the remaining chunks are processed by the other workers.

What you want to do is to split up your data into smaller chunks which will lead to each worker will process more than one chunk on average. I don't know if (think?) that is possible with snowfall. The parallel package, which is part of R itself and which replaces the snow package (that snowfall relies on), provides parApply() and parApplyLB() where the latter splits up your chunks into minimal sizes, i.e. one per data element (of test). See help("parApply", package = "parallel") for details.

The future.apply package (I'm the author), provides you with the option to scale how much you want to split up the data. It doesn't provide an apply() version, but a lapply() version that you can use (and how parApply() works internally). For instance, your example that uses one chunk per worker would be:

library(future.apply)
plan(multisession, workers = 46L)

## Coerce matrix into list with one element per matrix row
test_rows <- lapply(seq_len(nrow(test)), FUN = function(row) test[row,])

res <- future_lapply(test_rows, FUN = function(x) { 
  myfunction(x[1],x[2])
})

which is defaults to

res <- future_lapply(test_rows, FUN = function(x) { 
  myfunction(x[1],x[2])
}, future.scheduling = 1.0)

If you want to split up the data so that each worker processes one row at the time (cf. parallel::parApplyLB()), you do that as:

res <- future_lapply(test_rows, FUN = function(x) { 
  myfunction(x[1],x[2])
}, future.scheduling = Inf)

By setting future.scheduling in [1, Inf], you can control how big the average chunk size is. For instance, future.scheduling = 2.0 will have each worker process on average two chunks of data before future_lapply() returns.

EDIT 2021-11-08: The future_lapply() and friends are now in the future.apply package (where originally in future).

HenrikB
  • 6,132
  • 31
  • 34