4

I've got an mpi job I run in slurm using an sbatch script which looks something like:

# request 384 processors across 16 nodes for exclusive use:
#SBATCH --exclusive
#SBATCH --ntasks-per-node=24
#SBATCH -n 384
#SBATCH -N 16
#SBATCH --time 3-00:00:00
mpirun myprog

I want to monitor the memory/cpu usage and some other behaviour of the "myprog" processes. I've written a simple script (call it "monitor") which can do this, but I'm stumped on how to use sbatch to run ONE copy of it on each allocated node, at the same time as "myprog".

I think I need to modify the above to something like:

...
srun monitor
mpirun myprog

But I'm confused about whether a) that means "monitor" will run in the background and b) how I can control where "monitor" runs.

lost
  • 2,210
  • 2
  • 20
  • 34

1 Answers1

2

To have monitor run 'in the background', so actually the srun is non-blocking and the subsequent mpirun command can start, you simply need to add an ampersand (&) at the end.

To make sure that program runs on the 'master node' of the allocation, just remove the srun command.

If you need that program to run on a specific node, use the -n1 --nodelist option (you probably first need to get the list of all allocated nodes first.) You should also consider using the --overcommit option of srun to avoid dedicating a full CPU to your monitoring program which I assume is not CPU-bound.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110
  • Ah thanks. So if I did (gah stupid single-line comments, hope this makes sense): `<#SBATCH directives, as above>\n monitor & \n mpirun myprog` then monitor would only run on the "primary" node would it? Would it still get a whole CPU? (you're right that I don't need that - it's just a script scraping "top") – lost Dec 01 '15 at 14:52
  • yes it would run on the "primary" node and it would be allocated to the cgroup or cpuset (depending on the configuration) dedicated to that job on the primary node, practically sharing one of the cpus with the ones used by the mpi process – damienfrancois Dec 02 '15 at 12:16