I am trying to revive/resubmit stuck jobs (which run on an SGE scheduler) due to a node crash or say AWS spot instances being taken away? Can someone help in resuming such jobs? I have been trying to understand the usage of qsub
but not able to configure something that will automatically resubmit such jobs.
Also unable to configure my queue using qconf
command as only root
& sge_admin
users can run this command, I do have root
-privileges but asks me to set the SGE_ROOT
environment variable, which I did but still keeps throwing the error to set the variable.
Any sort of assistance would be highly appreciated.