Consider the following piece of code runing on a windows OS with doSNOW
package:
result.dt <- foreach(j = 1:nrow(keys),
.combine = function(...) rbindlist(list(...)),
.packages = c('data.table'),
.multicombine = TRUE) %dopar% {
data.select <- data[keys[j], on = colnames(keys)]
foo(data.select)
}
keys
is some data.table
with around 2000 rows and foo
is a function that runs decent amount of time (10s - 3m) and return a data.table
with a single row.
When I turn on .verbose = TRUE
, I can see the foo
function for different cores running as expected (say 30m). However, after everything with foreach
finishes, the code consumed another 30m (sometimes even longer) just to combine the 2000 data.table
.
Another thing to notice, when verbose = TRUE
, it always showed numValues: .., numResults: .., stopped: FALSE
even if it was the end of the foreach
. I was expecting to see stopped: TRUE
.
Any idea to what can be wrong and how to improve the performance?