With PBSpro I can request resources to run my job. My parallel cluster job boils down to running the same file multiple times, each time with a different index / job ID. Each task spawns its own sub-processes and each task in total uses 4 CPUs. This job is embarrassingly parallel, with each task independent of each other, and thus a good fit for the GNU parallel tool. To get the best usage of the cluster and squeeze my tasks in where ever there is space, I place a resource request to PBS as follows:
PBS -l select=60:ncpus=4:mpiprocs=1
. The resulting $PBS_NODEFILE
then contains a list of hosts assigned to the task.
The problem comes in with the fact that the PBSpro job manager can assign multiple jobs to the same node, or only 1 job to a node and somehow this information has to be passed to GNU parallel. Doing so with --sshloginfile $PBS_NODEFILE
does not carry over the varying resources information available on each node (and it appears GNU parallel only uses unique names from this list).
Things that go wrong are that GNU parallel sees X number of cores (the number of cores for the host / node) regardless whether only 1 job was assigned to that host. Limiting the number of jobs per host results in inefficient host usage with cores idle, or running more tasks on the host than available resources oversubscribing the cores.
The problem boils down to:
- How can one efficiently run parallel tasks through PBSpro,
- each task using more than 1 CPU,
- over a random (PBS allocated) selection of nodes,
- each with a varying number of assigned resources,
- that don't necessarily match the actual physical resources of the node.