Hangfire using MongoDB to execute Long-running background job keeps getting restarted

Question

I have an issue with Hangfire using version 1.6.19 atm with MongoDB as the storage, we currently have a method that is scheduled as follows:

BackgroundJob.Schedule(() => DoAsyncTask(parameters, JobCancellationToken.Null), TimeSpan.FromMinutes(X))

The task will run for over an hour and it contains a loop to validate when the job has completed. Inside the loop, there is a call to cancellationToken.ThrowIfCancellationRequested() to verify if a cancellation has been requested, but this call keeps getting fired approximately 30 minutes after the execution and terminates the job before completion.

I have been searching for information regarding this issue but most of it relates to older versions or the use of the InvisibilityTimeout which according to this answer has been deprecated, so I would like to know if anyone else has encountered this problem and any possible solutions.

Thank you

EDIT: After further investigation, I discovered the cancellation issue was only a side effect of the task being called again by HangFire after 30 minutes of running, and because I had validation set in place inside the method to avoid re-entry while the process was still running (to avoid duplication of data), the process would be treated as completed and therefore canceled.

So the real problem I face is I'm unable to determine the reason why HangFire keeps calling the process again after approximately 30 minutes of execution, I followed the steps described here to set the application on IIS to always run and prevent the pool from being recycled but the behavior persisted.

Armando Bracho · Answer 1 · 2018-07-13T12:51:19.247

The solution implemented for my problem was to use this filter to set a distributed lock on the job until it is properly finished. I made small changes to the implementation to include the job id and update the calls to the new objects used on this version of HangFire, so I will leave it here:

public class SkipConcurrentExecutionAttribute : JobFilterAttribute, IServerFilter
{
    private static readonly Logger logger = LogManager.GetCurrentClassLogger();

    private readonly int _timeoutInSeconds;

    public SkipConcurrentExecutionAttribute(int timeoutInSeconds)
    {
        if (timeoutInSeconds < 0) throw new ArgumentException("Timeout argument value should be greater that zero.");

        _timeoutInSeconds = timeoutInSeconds;
    }


    public void OnPerforming(PerformingContext filterContext)
    {
        var resource = $"{filterContext.BackgroundJob.Job.Type.FullName}.{filterContext.BackgroundJob.Job.Method.Name}.{filterContext.BackgroundJob.Id}";

        var timeout = TimeSpan.FromSeconds(_timeoutInSeconds);

        try
        {
            var distributedLock = filterContext.Connection.AcquireDistributedLock(resource, timeout);
            filterContext.Items["DistributedLock"] = distributedLock;
        }
        catch (Exception)
        {
            filterContext.Canceled = true;
            logger.Warn("Cancelling run for {0} job, id: {1} ", resource, filterContext.BackgroundJob.Id);
        }
    }

    public void OnPerformed(PerformedContext filterContext)
    {
        if (!filterContext.Items.ContainsKey("DistributedLock"))
        {
            throw new InvalidOperationException("Can not release a distributed lock: it was not acquired.");
        }

        var distributedLock = (IDisposable)filterContext.Items["DistributedLock"];
        distributedLock.Dispose();
    }
}

So the call to the background process is now:

[SkipConcurrentExecution(300)]
public async Task DoAsyncTask(parameters, IJobCancellationToken cancellationToken){
    //code execution here
}

I hope this helps, the reason for the reentry is still unknown so please feel free to extend this answer with any information you might find.

score 0 · Answer 2 · answered Nov 19 '19 at 12:39

Having the same issue with Hangfire.Core 1.7.6 and Hangfire.Mongo 0.5.6 in ServiceFabric cluster I've added PerformContext to my job using this guide.

This allows to get job ID of the current job: var jobId = performContext.BackgroundJob.Id;

The job which is scheduled to restart after 30 minutes is having the same job ID. So it is possible to check whether there is not a succeeded job with the same ID:

var backgroundJob = performContext.BackgroundJob;
var monitoringApi = JobStorage.Current.GetMonitoringApi();
var succeededCount = (int)monitoringApi.SucceededListCount();
if (succeededCount > 0) 
{
    var queryCount = Math.Min(succeededCount, 1000);

    // read up to 1000 latest succeeded jobs:
    var succeededJobs = monitoringApi.SucceededJobs(succeededCount - queryCount, queryCount);

    // check if job with the same ID already finished:
    if (succeededJobs.Any(succeededKp => backgroundJob.Id == succeededKp.Key)) 
    {
        // The job was already started and succeeded, skip this execution
        return;
    }
}

NOTE: The job method must also be annotated so it will not start concurrently. The timeout should have reasonable limit e.g. 6 hours: [DisableConcurrentExecution(6 * 60 * 60)]. Otherwise the second job could start after 30 minutes and not after the the first job finishes.

score -1 · Answer 3 · answered Jul 14 '18 at 18:26

-1

I had the same issue and I've spent a lot of time to find the solution in Hangfire topics. But then I've noticed that cancelation is fired only after console event.

So the problem is not in Hangfire itself but in the project Hangfire.Console. Do you use this extension? Switching to another logging method solved all my problems

answered Jul 14 '18 at 18:26

Nickolay Klestov

1

This is not an answer! Please consider adding more details about issues in `hangfire.console` – helcode Jul 14 '18 at 19:14
This is exactly the answer that I needed before wasting tons of time searching for the solution. Hangfire.Console is an optional extension that could be easily replaced. There are several opened Issues at Hangfire github and same questions on StackOverflow. And I've seen Console on screenshots there. And this bug looks like exactly like a bug in old Hangfire versions. It's important to know the difference. – Nickolay Klestov Jul 14 '18 at 22:34
Hello, @NickolayKlestov thank you for your input unfortunately in my case I am not using the hangfire.Console extension, I did try it for a while after the problem started, to see if I could get some additional insight into the problem but either with or without the extension the same behavior persisted, so i know that is not the cause of this problem. What method of logging did you switch to to solve your problems? – Armando Bracho Jul 15 '18 at 21:43

Hangfire using MongoDB to execute Long-running background job keeps getting restarted

3 Answers3