I have a processing job which uses MPI for parallelisation, but is (in this case) running in a single host. Each time I run a job, it "consumes" a number of cgroup "pids". Specifically each time I run the job,
/sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.current
Increases (not surprisingly) but when the job completes, it returns to larger value than it started. The increase in pids.current
equals the number of MPI processes launched. My job launches about 30 MPI processes and I have to run it hundreds of time, so pids.current
rapidly increases to a number where it is larger than pids.max
and no further processes for the specific user can be created. As a workaround I have been increasing pids.max
, but that is a pretty poor solution.
I launch the job with mpirun
and the cgroup setup is default for Debian.
I have demonstrated the same problem with a unrelated "mpi-helloworld" program from
https://github.com/wesleykendall/mpitutorial
so I am confident it is not my software. I have an identical machine wrt hardware, running the same Linux kernel/distribution which was setup by someone else which does not show this problem. One other machine (different hardware, same Linux kernel) also does not show the problem.
I can find no reference anywhere online of someone seeing the same problem.
I am running:
Debian GNU/Linux 9
Linux 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u5 (2018-09-30) x86_64 GNU/Linux
Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
libopenmpi-dev 2.0.2-2
libopenmpi2:amd64 2.0.2-2
openmpi-bin 2.0.2-2
openmpi-common 2.0.2-2
Can anyone suggest what is wrong or where to look?