in slurm, what will happen if the resources I required is not enough during the running of the job?
For example, #SBATCH --memory=10G; #SBATCH --cpus-per-task=2; python mytrain.py
is in myscript.sh
. After I run sbatch myscript.sh
the job is allocated the required cpu (2) and memory (10 G) successfully. But during the running of the job, the program need more memory than 10 Gb (like loading a big video dataset), I found the job would not be killed. The job will still work normally.
So my question is: is there any side effect when I underestimate the resource I need? (memory seems okay, but is it stll okay if the required cpu number is not enough?)

- 1,108
- 2
- 12
- 28
1 Answers
Slurm can be configured to constrain the jobs into their resource requests(the most usual setup) , which does not seem to be the case in the cluster you are using.
If it were the case, your job would be killed when trying to use more memory than requested, and it would be limited to the physical CPUs you requested.
In your case, using more memory than requested can lead to memory exhaustion on the node on which your job is running, possibly, having your processes (but also possibly processes of other jobs on the same node!), killed by the OOM killer. Using more CPUs than requested means the processes started by your job will compete with the processes of other jobs for the same physical CPU, leading to a general slow-down of all jobs on the node because of a large number of context switches. Jobs being slowed down can then exceed their maximum time and get killed.
Underestimating resources can thus lead to loss of your jobs. If nodes are shared among jobs, it can also lead to loss of jobs from other users.

- 52,978
- 9
- 96
- 110