I'm using multiprocessing on a cluster, using a single node which has 20 cores. Although I reserve only 10 cpus (-n 1 and -c 10 in Slurm) and the multiprocessing Pool is started with 8 workers, I see in the clusters monitor (Ganglia monitor) that the load exceeds by far the number of cpus reserved. With this setting I'm getting around 30 procs being loaded in the node.
I don't understand why I'm getting more processes than the number of workers I'm instantiating. The problem is worse if I reserve 20 cpus and let Pool set the number of workers automatically, with the number of processes jumping to about 100. Now the real problem is that the code can not run under this conditions because the admins cancel tasks that set more processes than the number of cpus in the node (after a few hours).
My code is basically solving a large linear algebra problem that can be solved by independent blocks, and it's structure is like this:
import pandas as pd
import numpy as np
import multiprocessing as mp
class storer:
res = pd.DataFrame(columns=['A','B',...])
def job(manuf, week):
# Some intensive job using the global data
# an np.linalg
return res
def child_initialize(_data):
global data
data = _data
def err_handle(err):
raise err
def join_results(job_res):
storer.res = storer.res.append(job_res, ignore_index=True)
def run_jobs(data, grid, output_file):
pool = mp.Pool(8, initializer=child_initialize,
initargs=(data, ))
for idx, row in grid.iterrows():
pool.apply_async(job,
args=(row[0], row[1]),
callback = join_results, error_callback=err_handle)
pool.close()
pool.join()
storer.res.to_csv(output_file)
return True
if __name__=="__main__":
#get data, grid, and output_file from sys.argv and from some csv
run_jobs(data, grid, output_file)