2

I am very confused about submitting a job on a multi-user cluster environment. I use a script with the following head

#BSUB -L /bin/bash
#BSUB -n 10
#BSUB -J jobname
#BSUB -oo log/output.%J
#BSUB -eo log/error.%J
#BSUB -q queue_name
#BSUB -P project_name
#BSUB -R "span[ptile=12]"
#BSUB -W 2:0

mpirun ./someexecutable

In my intention, this jobs should run on 10 processors (cores) and span 1 entire node (because each node on the machine has 12 cores), so the node is fully ised by me and no other user interfere on my node. I have explicitly checked and it looks like my code is using 10 cores at runtime.

Now I am talking with somebody and they are telling me that in this way I am actually asking for 120 cores. I think this is not right but maybe I have misunderstood the instructions

https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_admin/span_string.html

Shall I use instead?

#BSUB -R "span[hosts=1]" 
simona
  • 2,009
  • 6
  • 29
  • 41
  • 2
    Your description suggests what you need is [exclusive scheduling](https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_admin/exclusive_scheduling_lsf.html). Note that if your MPI library has not specific support for LSF, you might end up running on one node only, regardless how many nodes were indeed allocated, nor how many cores were allocated to this node. – Gilles Gouaillardet Feb 25 '18 at 01:37
  • actually I am running on 10 cores because I have checked. somebody told me that if I use `mpirun -np 10 ./someexecutable` I will end up running on 10 virtual cores for each core, is that right? – simona Feb 25 '18 at 01:51
  • what about `#BSUB -R "span[hosts=1]"` instead? – simona Feb 25 '18 at 01:58
  • what did you check ? the number of allocated cores ? or the number of MPI tasks ? `mpirun -np 10 a.out` will run 10 MPI tasks period. this is regardless how many nodes and how many cores per node were allocated. btw, why don't you ask this "somebody" directly ? – Gilles Gouaillardet Feb 25 '18 at 03:05
  • I have checked at runtime, using `mpi_comm_size`. <> because I don't have them around available for asking, otherwise I would – simona Feb 25 '18 at 12:24
  • keep in mind you checked how many MPI tasks were started, which might be different than the number of allocated cores. – Gilles Gouaillardet Feb 25 '18 at 13:46

1 Answers1

1

In my intention, this jobs should run on 10 processors (cores) and span 1 entire node

Yes, you want to use

#BSUB -n 10
#BSUB -R "span[hosts=1]"

Which means put the job on only 1 host.

and no other user interfere on my node

You can get exclusive access to the host with

#BSUB -x

FYI. You can think of

#BSUB -R "span[ptile=x]"

as, put at most x slots on a single host.

Michael Closson
  • 902
  • 8
  • 13