4

Goal:

  1. learn how to run or co-schedule or execute executables/applications with a sbatch job submission
  2. using either srun or mpirun

Research:

Code snippet:

 #!/bin/bash
 #SBATCH --job-name LEBT 
 #SBATCH --partition=angel
 #SBATCH --nodelist=node38
 #SBATCH --sockets-per-node=1
 #SBATCH --cores-per-socket=1
 #SBATCH --time 00:10:00 
 #SBATCH --output LEBT.out

 # the slurm module provides the srun command
 module load openmpi


 srun  -n 1   ./LU.exe -i 100 -s 100  &
 srun  -n 1   ./BT.exe  &

 wait 

Man Pages:

 [srun]-->[https://computing.llnl.gov/tutorials/linux_clusters/man/srun.txt]

 [mpirun]-->[https://www.open-mpi.org/doc/v1.8/man1/mpirun.1.php]
itsmrbeltre
  • 412
  • 7
  • 18
  • Your script would work if you requested at least two tasks with `--ntasks=2` – damienfrancois Nov 07 '16 at 20:14
  • @damienfrancois I was able to store the output of both applications with the answered I provided below. They seemed to be executed on parallel which made me think that the threading is working properly since they are being executed at the same time. Obviously, if I execute application A(20s) and application B(50s), if they are runnign on parallel the job should finish around where B is (50s) or so. Am I correct? Now, is it okay to execute the application in such a fashion ? or am i doing something out of the ordinary? – itsmrbeltre Nov 07 '16 at 20:24
  • If that is the case, it means that your Slurm installation does not confine jobs onto the CPU they were allocated. On a cluster with cpusets or cgroups set, your script would take 70s (except if they just sleep) – damienfrancois Nov 07 '16 at 20:31
  • @damienfrancois This is great information. Well, is there away to make them run on parallel on the same node? – itsmrbeltre Nov 07 '16 at 20:37
  • use `--cpus-per-task=2` – damienfrancois Nov 07 '16 at 20:39
  • @damienfrancois do you mind restructuring a job submission with the above executables and how you will activate the flags to make sure they run on parallel? I would appreciate it, that will also mean that I do not need to use python script to make the execution of each executable into a thread. – itsmrbeltre Nov 07 '16 at 20:45

2 Answers2

4

Your script will work modulo a minor modification. If you don't care if your processes run on the same node or not, add #SBATCH --ntasks=2

#!/bin/bash
#SBATCH --job-name LEBT 
#SBATCH --ntasks=2
#SBATCH --partition=angel
#SBATCH --nodelist=node38
#SBATCH --sockets-per-node=1
#SBATCH --cores-per-socket=1
#SBATCH --time 00:10:00 
#SBATCH --output LEBT.out

# the slurm module provides the srun command
module load openmpi

srun  -n 1 --exclusive  ./LU.exe -i 100 -s 100  &
srun  -n 1 --exclusive  ./BT.exe  &

wait 

The --exclusive argument to srun is there to tell srun to run with a subset of the whole allocation see the srun manpage.

If you want both processes to run on the sam node, use --cpus-per-task=2

#!/bin/bash
#SBATCH --job-name LEBT 
#SBATCH --cpus-per-task=2
#SBATCH --partition=angel
#SBATCH --nodelist=node38
#SBATCH --sockets-per-node=1
#SBATCH --cores-per-socket=1
#SBATCH --time 00:10:00 
#SBATCH --output LEBT.out

# the slurm module provides the srun command
module load openmpi

srun  -c 1 --exclusive  ./LU.exe -i 100 -s 100  &
srun  -c 1 --exclusive  ./BT.exe  &

wait 

Note that then, you must run srun with -c 1 rather than with -n 1.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110
  • The second option is what I want to do, but i get the following error. srun: error: --ntasks must be set with --exclusive srun: error: --ntasks must be set with --exclusive – itsmrbeltre Nov 07 '16 at 21:25
  • Are you simply running your script? You are supposed to submit it with sbatch – damienfrancois Nov 07 '16 at 21:50
  • Yes, I am using sbatch to submit the job.When I try it without sbatch it fails. Instead, since the srun did not allow parallel execution, can you set it up to run on parallel using "mpirun"? – itsmrbeltre Nov 07 '16 at 21:57
0

After extensive research, I have concluded that "srun" is the command you want to use to run jobs on parallel. Moreover, you need a helper script to be able to adequately execute the whole process. I have written the following script to execute the applications in one node with no problem.

#!/usr/bin/python
#SBATCH --job-name TPython
#SBATCH --output=ALL.out
#SBATCH --partition=magneto
#SBATCH --nodelist=node1


import threading
import os

addlock = threading.Lock()

class jobs_queue(threading.Thread):
    def __init__(self,job):
            threading.Thread.__init__(self,args=(addlock,))
            self.job = job
    def run(self):
            self.job_executor(self.job)

    def job_executor(self,cmd):
            os.system(cmd)

if __name__ == __main__:

    joblist =  ["srun  ./executable2",
                "srun  ./executable1 -i 20 -s 20"]

    #creating a thread of jobs 
    threads = [jobs_queue(job)  for job in joblist]

    #starting jobs in the thread 
    [t.start() for t in threads]

    #no interruptions 
    [t.join()  for t in threads]

Both executables in my particular case with the particular flags activated yield around 55 seconds each. However, when they were ran on parallel, they both yield 59 seconds execution time.

itsmrbeltre
  • 412
  • 7
  • 18