I'm training several Neural Networks on a server in my University. Due to limited resources for all the students, there is a job scheduling system called (Slurm) that queues all students runs and in addition, we are only allowed to run our commands with a time limit (24h). Once exceed this processing time, our running process is closed to give resource availability to the others.
Specifically, I'm training GAN's and I need more training time than 24h. Right now, I'm saving the checkpoints of my model to restart from the same training point before the process closure. But, I must execute the same command again after 24h.
For this reason I would like to schedule this execution every 24h automatically.
Currently I'm using 'tmux' to execute the command and be able to close the terminal.
Some suggestion on how to automate this kind of execution?
Thank you in advance!