0

Consider the following piece of code runing on a windows OS with doSNOW package:

result.dt <- foreach(j = 1:nrow(keys), 
                    .combine = function(...) rbindlist(list(...)), 
                    .packages = c('data.table'), 
                    .multicombine = TRUE) %dopar% {
    data.select <- data[keys[j], on = colnames(keys)]
    foo(data.select)
}

keys is some data.table with around 2000 rows and foo is a function that runs decent amount of time (10s - 3m) and return a data.table with a single row.

When I turn on .verbose = TRUE, I can see the foo function for different cores running as expected (say 30m). However, after everything with foreach finishes, the code consumed another 30m (sometimes even longer) just to combine the 2000 data.table.

Another thing to notice, when verbose = TRUE, it always showed numValues: .., numResults: .., stopped: FALSE even if it was the end of the foreach. I was expecting to see stopped: TRUE.

Any idea to what can be wrong and how to improve the performance?

Chen Chen
  • 358
  • 4
  • 15
  • 1
    Let `foreach` return a list (by not specifying `.combine`) and do `result.dt <- rbindlist(result.dt)` afterwards? I believe your anonymous function might force a deep copy of all data.tables. – Roland Jan 25 '18 at 07:26
  • @Roland I tried that approach already but ran into the same thing. – Chen Chen Jan 25 '18 at 13:11
  • Something is strange here. How many columns has that single-row data.table? – Roland Jan 25 '18 at 13:27
  • @Roland Only three. – Chen Chen Jan 25 '18 at 13:31
  • `rbindlist` is instantaneous for 2000 1x3 data.tables. What happens if you change the last line of the loop to `as.data.frame(foo(data.select))`? – Roland Jan 25 '18 at 13:49
  • @Roland It showed the same behavior. – Chen Chen Jan 25 '18 at 14:14
  • Sorry, I'm out of ideas. You'll need to provide a reproducible example for further help. – Roland Jan 25 '18 at 14:15
  • @Roland Thanks anyway. One more thing to notice is that the combination always happen at the end of everything. I thought it was supposed to happen every 100 iteration with `.multicombine = TRUE` – Chen Chen Jan 25 '18 at 14:34
  • What makes you think that? I suspect it simply switches between use of `Reduce` and `do.call`. – Roland Jan 25 '18 at 14:42
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/163880/discussion-between-chen-chen-and-roland). – Chen Chen Jan 25 '18 at 14:46

0 Answers0