0

I am running jobs on Condor and have noticed that for some reason a subset of my jobs will run but never complete. Is there a setting in the submit file that kills and then resubmits a job if it takes over a certain amount of time to complete? This is similar to the question Condor Timeout for idle jobs except I want Condor not to simply kill the jobs, but resubmit them as well.

Thanks!

Community
  • 1
  • 1
shadowprice
  • 617
  • 2
  • 7
  • 14

1 Answers1

0

you can use the KILL transition expression in the machine class add file (Condor user manual). Something like:

START = True
...
+MaxJobExecutionTime = xxx #seconds
KILL            = $(ActivityTimer) > MaxJobExecutionTime

Like this the machine will kill jobs that take more than MaxExecutionTime. Condor will then retry the job.

pfnuesel
  • 14,093
  • 14
  • 58
  • 71
SCa
  • 1