2

I was trying to setup a preemption in my SLURM 19.05 cluster, but I could not figure out how to make preemption work like what I planned.

Basically, I have two QOS.

$ sacctmgr show qos format=name,priority,preempt

      Name   Priority    Preempt
---------- ---------- ----------
    normal          0
   premium       5000     normal

These are the relevant setting in my configuration for preemption:

# SCHEDULING

SelectType=select/cons_res
FastSchedule=1
SelectTypeParameters=CR_CPU_Memory    
PreemptType=preempt/qos
PreemptMode=SUSPEND,GANG

PriorityType=priority/multifactor
PriorityWeightFairshare=10000
PriorityWeightAge=10000
PriorityWeightJobSize=10000
PriorityFavorSmall=YES
PriorityWeightQOS=10000

PartitionName=Compute OverSubscribe=FORCE:1 State=UP Nodes=compute01,compute02

My plan was to allow premium job to preempt the normal job, suspend the normal job until the premium job finish running in the cluster.

However, the preemption I observed seems to time slice and suspend two jobs in sequence every 30 seconds. Is there anything I had missed in the configuration files or SLURM just couldn't offer the preemption I was planning where I do not want any time slice on the resources?

Woody
  • 612
  • 9
  • 21

1 Answers1

2

The problem is that PreemptMode=SUSPEND,GANG with PreemptType=preempt/qos results in timeslicing.

You must either set PreemptType to preempt/partition_prio, resulting in "suspend and automatically resume the low priority jobs", or set PreemptMode to REQUEUE, where jobs will be aborted and put back in the queue.

As far as I know these are the options closest to what I think you want.

https://slurm.schedmd.com/slurm.conf.html#PreemptMode

GANG enables gang scheduling (time slicing) of jobs in the same partition. NOTE: Gang scheduling is performed independently for each partition, so configuring partitions with overlapping nodes and gang scheduling is generally not recommended.

REQUEUE preempts jobs by requeuing them (if possible) or canceling them. For jobs to be requeued they must have the --requeue sbatch option set or the cluster wide JobRequeue parameter in slurm.conf must be set to one.

SUSPEND If PreemptType=preempt/partition_prio is configured then suspend and automatically resume the low priority jobs. If PreemptType=preempt/qos is configured, then the jobs sharing resources will always time slice rather than one job remaining suspended. The SUSPEND may only be used with the GANG option (the gang scheduler module performs the job resume operation).

Lord Ingo
  • 121
  • 3