Stop recurring Autosys job from scheduling after first failure

Question

We have a recurring job JOB_A which runs every 15 mins. If it fails, we have to force start another box,BOX_TO_FIX to fix the issue.

But the problem here is, our Operations team is taking more time 20-30 mins to respond to failure of JOB_A. Before they could start BOX_TO_FIX, this recurring job JOB_A starts again and fails for 2nd time.

Our concern is, another operator may take this 2nd alert and may run the BOX_TO_FIX second time which we have to avoid.

Is it possible to stop the recurring job JOB_A from scheduling after it failed in the first instance? If the status is failed, it should not start again until we fix the reason for failure?

Depending on what JOB_A is and how it fails you might be able to have that job put itself ON_ICE in case of failure with the sendevent command `sendevent -E ON_ICE -J "JOB_A"` — HBruijn, Aug 22 '16 at 18:25

score 0 · Answer 1 · answered Oct 17 '16 at 19:26

Sounds like two workflow issues.

Running BOX_TO_FIX when JOB_A fails.
Not allowing JOB_A to run when it has failed, until BOX_TO_FIX can run.

Is it feasible to set a failure(JOB_A) condition on BOX_TO_FIX so it will automatically start up when JOB_A fails?

Regardless of that answer, you can set a global variable which disables JOB_A on its failure until it is reset by BOX_TO_FIX's success.

insert_job: JOB_A
condition: value(JOB_A_IS_BROKEN) = 0
etc.

insert_job: OMG_A_BROKE
condition: failure(JOB_A)
command: sendevent -E SET_GLOBAL -G JOB_A_IS_BROKEN=1

insert_job: BOX_TO_FIX_IS_FINISHED
box_name: BOX_TO_FIX
condition: success(last cmd in BOX_TO_FIX)
command: sendevent -E SET_GLOBAL -G JOB_A_IS_BROKEN=0

Stop recurring Autosys job from scheduling after first failure

1 Answers1