0

When we submit a job via sbatch, the pid to jobs given by incremental order. This order start from again from 1 based on my observation.

sbatch  -N1 run.sh
Submitted batch job 20

//Goal is to change submitted batch job's id, if possible.

[Q1] For example there is a running job under slurm. When we reboot the node, does the job continue running? and does its pid get updated or stay as it was before?

[Q2] Is it possible to give or change pid of the submitted job with a unique id that the cluster owner want to give?

Thank you for your valuable time and help.

alper
  • 2,919
  • 9
  • 53
  • 102

1 Answers1

2

If the node fails, the job is requeued - if this is permitted by the JobRequeue parameter in slurm.conf. It will get the same Job ID as the previously started run since this is the only identifier in the database for managing the jobs. (Users can override requeueing with the --no-requeue sbatch parameter.)

It's not possible to change Job ID's, no.

ciaron
  • 1,089
  • 7
  • 15
  • `JobRequeue=1` was commented on on my slurm.conf file. If I make it avaible, as I understand requeued will be done, but the job will start running from beginning and do not continue from where it left before the shut down. If there will be `--no-requeue` after restart of the node, job will not run again right? @ciaron – alper Apr 12 '17 at 11:01
  • With `JobRequeue=0` or `--no-requeue`, the job will not restart automatically - otherwise it will restart from the beginning. If you want jobs to restart where they left off, you may want to look into checkpoint/restart with [BLCR](https://slurm.schedmd.com/checkpoint_blcr.html) – ciaron Apr 16 '17 at 11:59