1

I have a setup consisting from 3 workers and a management node, which I use for submitting tasks. I would like to execute concurrently a setup script at all workers:

bsub -q queue -n 3 -m 'h0 h1 h2' -J "%J_%I" mpirun setup.sh

As far as I understand, I could use 'ptile' resource constraint to force execution at all workers:

bsub -q queue -n 3 -m 'h0 h1 h2' -J "%J_%I" -R 'span[ptile=1]' mpirun setup.sh

However, occasionally I face an issue that my script got executed several times at the same worker.

Is it expected behavior? Or there is a bug in my setup? Is there a better way for enforcing multi worker execution?

CaptainTrunky
  • 1,562
  • 2
  • 15
  • 23

1 Answers1

1

Your understanding of span[ptile=1] is correct. LSF will only use 1 core per host for your job. If there aren't enough hosts based on the -n then the job will pend until something frees up.

However, occasionally I face an issue that my script got executed several times at the same worker.

I suspect that its something with your script. e.g., LSF appends to the stdout file by default. Use -oo to overwrite.

Michael Closson
  • 902
  • 8
  • 13