0

I'm running some tests on a cluster which requires me to restart the execution host but not the master. I was wondering how I could pause the job that is currently running, restart the exec. host, and continue running that job from where it left off.

Would I need to start a job on the master system from the execution host in order to wake up the job on the exec. host?

  • What do you mean by "restart the execution host". Do you mean restart sge_execd? Or reboot the execution server? – Vince Jul 21 '14 at 16:56
  • Reboot the execution server – user3854700 Jul 21 '14 at 18:07
  • If you reboot the server, how can you resume a job ... unless you save intermediate results to a file. Upon reboot, memory will be reset and you will lose the progress. GridEngine can restart a job that failed (i.e. job was running or in queue, but due to reboot did not finish), but I fail to see how it can resume a job given that the memory of the execution host was wiped on reboot. – Vince Jul 22 '14 at 00:27

0 Answers0