We run Quartz.NET for a variety of jobs on different schedules ranging from every 30 seconds to once a week.
On reviewing our internal logging, we find that some jobs have ceased running for no discernible reason, even while others continued. As an example, our every-30-second job failed at a given time, while a different every-10-minute job continued for a few hours, then also failed. A daily task ceased later on.
We enabled Quartz logging and found the following.
LOG OF PREVIOUS FIRE, WHICH WAS SUCCESSFUL:
2014-09-19 08:20:00.0130 DEBUG Producing instance of Job 'DEFAULT.Scheduled task #5', class=TaskRunner
2014-09-19 08:20:00.0130 DEBUG Calling Execute on job DEFAULT.Scheduled task #5
2014-09-19 08:20:00.0130 DEBUG Batch acquisition of 1 triggers
2014-09-19 08:20:00.8710 DEBUG Trigger instruction : NoInstruction
2014-09-19 08:20:00.8710 DEBUG Batch acquisition of 1 triggers
LOG OF FIRST FAILURE:
2014-09-19 08:30:00.0046 DEBUG Producing instance of Job 'DEFAULT.Scheduled task #5', class=TaskRunner
2014-09-19 08:30:00.0046 DEBUG Calling Execute on job DEFAULT.Scheduled task #5
2014-09-19 08:30:00.0046 DEBUG Batch acquisition of 1 triggers
After this, this particular job never ran again until we restarted the service. There is no indication that any of our code was run on this particular instance as we do our own logging internally, which had not occurred at that time.
Our misfire handling is configured for every job as follows:
... TriggerBuilder.Create()
.WithCronSchedule( task.CronSchedule, x => x.WithMisfireHandlingInstructionDoNothing())
.Build();
I understand the "DoNothing" instruction tells it to skip this fire and continue with the schedule. Therefore if a misfire occurred I would expect it to fire again on the next fire time.
1) Why are our Quartz jobs failing at random times?
2) What can we do to investigate further?