Since MPI processes are invoded by mpirun/mpiexec, not by the resource manager or scheduler(torque/maui), then how to use cgroup to isolate memory and cpuset usage for every MPI process, note that we can not modify the MPI library(openmpi/mpich2), but it's acceptable for modification of resource manager and scheduler, thanks.
Asked
Active
Viewed 915 times
1
-
Open MPI uses various mechanisms to start remote processes, implemented in the `plm` framework. If Open MPI was compiled with `tm` support, then it will launch all remote processes using the `tm` interface of Torque. Then it is up to the resource manager to set the correct limits. One has to configure Open MPI with the `--with-tm=/path/to/torque/instal/dir` in order to enable `tm` support. – Hristo Iliev Nov 05 '12 at 09:46
-
Thanks, I've found that openmpi does use tm module to start remote processes, just as what pbsdsh does, although I don't know whether openmpi uses pbsdsh or it implements its own tm module, but the problem is solved, I modified rm(torque) to make the resource isolated between jobs with cgroup, thanks a lot. – levin li Nov 18 '12 at 03:03
-
That was my point exactly - one should make such modifications to the resource manager, not to the MPI library. It's great that you have it working. Since Torque is open source, would you care to contribute your modifications to the project? I'm seeing lots of interest in using cgroups those days - we try to make them work with LSF right now. – Hristo Iliev Nov 18 '12 at 12:24
-
I'd like to share my code, but can you tell me where can I submit the patch, is there a development mail list for torque? – levin li Nov 19 '12 at 03:37
-
As a matter of fact, there is a [mail list](http://www.supercluster.org/mailman/listinfo/torqueusers). Adaptive Computing says that patches are welcome there. – Hristo Iliev Nov 19 '12 at 06:52
-
Is there any progress made on this? Where can we find these patches? – Jens Timmerman Mar 10 '14 at 10:26