I'd like to create an array of SLURM workers, and whenever one of those workers finishes its work, I'd like to restart the worker.
If it were possible to run jobs of infinite duration on my queue, I'd of course do that that instead, but because this isn't possible, I thought I'd just create an infinite series of workers.
Is this possible in SLURM? I thought I could submit an sbatch
command from inside the last worker in my worker array to just restart the entire sequence, but the compute nodes that workers run on in my cluster don't have access to the sbatch
callable.
Any pointers on this question would be super helpful!