I am using concurrent.futures.ProcessPoolExecutor to run python codes in parallel. Basically what I do is
with concurrent.futures.ProcessPollExecutor(max_workers=10) as executor:
futures = {executor.submit(my_function, i)
for i in range(n)}
for fut in concurrent.futures.as_completed(futures):
print(fut.result())
This works fine with small number of n
but for larger n it takes up a lot of RAM. I felt storing futures set (or list) is taking up the RAM. So I tried not to store the futures set and implemented what ever I wanted to do with the results in my_function itself. Like
with concurrent.futures.ProcessPollExecutor(max_workers=10) as executor:
for i in range(n) :
executor.submit(my_function, i)
But still It takes up a lot of RAM.
With some more digging, I found this. I understood that the first for loop submits all the tasks, but it takes time to execute them. So those tasks which are submitted but not executed will be stored in RAM.
Intuitively, I understood that one should not submit all the tasks at once, rather submit them gradually as the previous tasks are completed. I don't want to add any sleep/delay in the loop. Is there any better way to do that. I really did not understand is with map
method instead of submit
, what the chunksize
argument does and how to decide what value to assign to it.
Is there any better or elegant way to do it? Or am I completely wrong? I used GNU parallel before, and it doesn't cause such large RAM problems. I want to have a python only solution.