I'm parallelizing a loop that creates a relatively large dataset at each iteration. I'm using foreach::foreach()
along with the doParallel
backend. When I use foreach the standard way, my RAM usage blows up way before the loop is done. I would thus like to have each iteration of foreach save the created dataset to a file on disk and drop it from the memory right after. Essentially, I want each iteration to only have a side effect. I've tried the following, where the .combine = c
and the NULL
return make foreach return just NULL at the end:
library(tidyverse)
library(foreach)
library(doParallel)
# parallel computation setup
numCores <- detectCores(logical = F)
registerDoParallel(numCores)
some_big_number <- 10
# foreach loop
foreach(i = 1:10, .combine = c) %dopar% {
x <- rep(1, some_big_number) %>% enframe() # task that creates large object
filename <- paste0('X', i, '.csv')
write_csv(x, filename)
NULL
}
However, all the data created still seems to be stacked into memory while the loop is running, and my RAM still blows up. How can I achieve the desired behavior?