Python concurrent.futures.ProcessPoolExecutor crashing with full RAM
Program description
Hi, I've got a computationally heavy function which I want to run in parallel. The function is a test that accepts as inputs:
- a DataFrame to test on
- parameters based on which the calculations will be ran.
The return value is a short list of calculation results.
I want to run the same function in a for loop with different parameters and the same input DataFrame, basically run a brute-force to find optimal parameters for my problem.
The code I've written
I currently am running the code concurrently with ProcessPoolExecutor from the module concurrent.futures.
import concurrent.futures
from itertools import repeat
import pandas as pd
from my_tests import func
parameters = [
(arg1, arg2, arg3),
(arg1, arg2, arg3),
...
]
large_df = pd.read_csv(csv_path)
with concurrent.futures.ProcessPoolExecutor() as executor:
for future in executor.map(func, repeat(large_df.copy()), parameters):
test_result = future.result()
...
The problem
The problem I face is that I need to run a large amount of iterations, but my program crashes almost instantly.
In on order for it not to crash, I need to limit it to max 4 workers, which is 1/4 of my CPU resources.
with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
...
I figured out my program crashes due to a full RAM (16 GB). What I found weird is that when I was running it on more workers, it was gradually eating more and more RAM, which it never released, until it crashed.
Instead of passing a copy of the DataFrame, I tried to pass the file path, but apart of slowing down my program, it didn't change anything.
Do you have any idea of why that problem occurs and how to solve it?