I am running a job on a multinode cluster with slurm, OpenMPI, and python (anaconda with MKL). When I submit the job it all seems to work as expected. However, if I login to one of the nodes running the job and use htop to see the running processes I see the jobs that I started and for each one I see 10 more "clone" processes that occupy the same memory as the job I started but have a 0 CPU load (all that changes is the PID and the CPU(0%) everything else is the same).
Can anyone explain this behavior?
Thanks!
P.S. here is the batchscript I use to submit the jobs:
#!/bin/zsh
#SBATCH --job-name="DSC on Natims"
#SBATCH -n 16
#SBATCH -N 8
#SBATCH --ntasks-per-node=2
#SBATCH --mem-per-cpu=20G
#SBATCH --output="log_dsc%j.out"
#SBATCH --error="log_dsc%j.err"
mpiexec -iface bond0 python dsc_run.py