2

I wan to run a code using multiprocessing in a server with slurm architecture. I want to limit the number of cpus available and that the code creates a child process for every of them.

My code could be simplified in this way:

def Func(ins) : 
  ###
  things...
  ###
return var

if __name__ == '__main__' :
  from multiprocessing import Pool
  from multiprocessing import active_children
  from multiprocessing import cpu_count

  p = Pool()
  print("active cpus = ", cpu_count())
  print("open process = ", p._processes)
  print("active_children = ", len(active_children()))
  results = p.map(Func, range(2000))
  p.close()

  exit()

ruled by this bash script:

#!/bin/bash

#SBATCH --time=1:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=48
#SBATCH --mem=40000 # Memory per node (in MB).

module load python 
conda activate myenv
python3 test.py

echo 'done!'

What I get is that the code runs every time on the maximum number of cpus (272), whatever combination of parameters I try:

active cpus =  272
open process =  272
active_children =  272
done!

I launch the job with the command

sbatch job.sh

What I'm doing wrong?

  • According to documentation, `cpu_count` "Return the number of CPUs in the system.". And when you initialize the pool with `Pool()`, documentation says that "processes is the number of worker processes to use. If processes is None then the number returned by os.cpu_count() is used.". So, your code by definition, uses all the CPUs in the system. If you want a different behavior, you will have to implement it. – Poshi May 22 '23 at 18:48
  • But in theory cpus-per-task is not suppose to regulate the number of cpus seen by the code? In C++, parallelising with pragma, it works. I tried also to set ntasks-per-node=48 and cpus-per-tasks=1 but nothing happened. – Simone Sartori May 22 '23 at 19:28
  • 1
    If some control mechanism is used (like cgroups), the OS will prevent you using more than the assigned CPUs. But this does not change the number of CPUs in the system (this is given by the hardware and fixed), which is what the methods you used return/use. That parameter regulates how many resources you are allowed to use, but they do not change what the code sees when the code ask about how many CPUs the system have. You can start as many threads as you want, but if you are restricted to a smaller number, then they will have to be run concurrently so all of them can finish. – Poshi May 22 '23 at 21:41

1 Answers1

0

Your Python code is responsible for creating the wanted number of processes based on the Slurm allocation.

If you want, as is often the case, to have one process per allocated CPU, your code should look like this:

if __name__ == '__main__' :
  from multiprocessing import Pool
  from multiprocessing import active_children
  from multiprocessing import cpu_count

  ncpus = int(os.environ['SLURM_CPUS_PER_TASK'])
  p = Pool(ncpus)

  print("active cpus = ", cpu_count())
  print("open process = ", p._processes)
  print("active_children = ", len(active_children()))
  results = p.map(Func, range(2000))
  p.close()

  exit()

The SLURM_CPUS_PER_TASK environment variable will hold the value you specify in the #SBATCH --cpus-per-task=48 line in the submission script.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110