0

I sometimes get the following exception, when a job gets executed:

System.Data.SqlClient.SqlException (0x80131904): Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
 ---> System.ComponentModel.Win32Exception (258): The wait operation timed out.
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)
   at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite, String methodName)
   at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
   at Dapper.SqlMapper.ExecuteCommand(IDbConnection cnn, CommandDefinition& command, Action`2 paramReader)
   at Dapper.SqlMapper.ExecuteImpl(IDbConnection cnn, CommandDefinition& command)
   at Hangfire.SqlServer.SqlServerDistributedLock.Acquire(IDbConnection connection, String resource, TimeSpan timeout)
   at Hangfire.SqlServer.SqlServerConnection.AcquireLock(String resource, TimeSpan timeout)
   at Hangfire.DisableConcurrentExecutionAttribute.OnPerforming(PerformingContext filterContext)
   at Hangfire.Profiling.ProfilerExtensions.InvokeAction[TInstance](InstanceAction`1 tuple)
   at Hangfire.Profiling.SlowLogProfiler.InvokeMeasured[TInstance,TResult](TInstance instance, Func`2 action, String message)
   at Hangfire.Profiling.ProfilerExtensions.InvokeMeasured[TInstance](IProfiler profiler, TInstance instance, Action`1 action, String message)
   at Hangfire.Server.BackgroundJobPerformer.InvokePerformFilter(IServerFilter filter, PerformingContext preContext, Func`1 continuation)
ClientConnectionId:cf3220e9-bbf5-40f2-8dc5-0d7836736771
Error Number:-2,State:0,Class:11

Here's the job method which fails:

[AutomaticRetry(Attempts = 0)]
[DisableConcurrentExecution(30 * 60)]
public async System.Threading.Tasks.Task ExecuteAsync(int id, PerformContext context, CancellationToken cancellationToken)
{
    ...
}

As the timeout of DisableConcurrentExecutionAttribute is already set to 30 minutes, what's the problem here? I even increased the CommandTimeout in the hangfire sql server configuration to 60 seconds:

builder.Services.AddHangfire((provider, config) => config
    .SetDataCompatibilityLevel(CompatibilityLevel.Version_170)
    .UseSimpleAssemblyNameTypeSerializer()
    .UseRecommendedSerializerSettings()
    .UseSqlServerStorage(databaseConnectionStr, new SqlServerStorageOptions
    {
        CommandBatchMaxTimeout = TimeSpan.FromMinutes(5),
        CommandTimeout = TimeSpan.FromSeconds(60),
        SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
        QueuePollInterval = TimeSpan.Zero,
        UseRecommendedIsolationLevel = true,
        DisableGlobalLocks = true,
        PrepareSchemaIfNecessary = false
    })
    .WithJobExpirationTimeout(TimeSpan.FromDays(30 * 6))
    .UseFilter(provider.GetRequiredService<SendEMailOnFailureFilter>())
    .UseConsole());

If I manually re-queue the same job, it works everytime.

UPDATE:

Here's a screenshot of a failed job where we can see, that it's already failing after just 37 total seconds, which is less than the timeout for DisableConcurrentExecution and the CommandTimeout (SQL) configured in hangfire. On another instance, it failed after just 27 seconds.

enter image description here

UPDATE 2:

As @jaroslav suggested, I tried a dummy job with a 70 second sleep/delay, but that did work flawlessly. Even the DisableConcurrentExecution attribute worked well:

[AutomaticRetry(Attempts = 0)]
[DisableConcurrentExecution(30 * 60)]
public override async Task ExecuteAsync(PerformContext context, CancellationToken cancellationToken)
{
    await Task.Delay(70 * 1000, cancellationToken);
}
roli09
  • 817
  • 1
  • 8
  • 20
  • To my mind setting both `CommandTimeout` and `CommandBatchMaxTimeout` should be enough. At least according to the Hangfire code - [SqlServerConnection](https://github.com/HangfireIO/Hangfire/blob/master/src/Hangfire.SqlServer/SqlServerConnection.cs#L435). Could you try firing a job only with `Thread.Sleep()` to make sure its Hangfire which timeouts and not some internal sql queries that timeouts within your Hangfire job? – Jaroslav Feb 02 '23 at 08:57
  • @Jaroslav I tried your suggestion (see Update 2 in my original post), but that did work flawlessly. I configured some event logging on the sql server instance, so maybe that gives me some more insights the next time it occurs. – roli09 Feb 02 '23 at 10:10

1 Answers1

0

I suspect that hangfire is doing some other work while executing the task. This exceeds the default sqlCommandTimeout (30 seconds). Try increasing the default SqlCommand timeout:

using (SqlCommand cmd = new SqlCommand(query, con))
{
    cmd.CommandType = CommandType.StoredProcedure;
    cmd.CommandTimeout = 3600; // 1 hour
    await cmd.ExecuteNonQueryAsync();
}
Chen
  • 4,499
  • 1
  • 2
  • 9
  • As I mentioned in my original post, I have `CommandTimeout` already set to 60 seconds. I added some new infos to my original post, where you can see, that the job in different instances failed after just 27 seconds, some time after 34 seconds and another time after 37 seconds. That's all well below the `CommandTimeout` of 60 seconds. – roli09 Feb 02 '23 at 06:53
  • Hi @roli09, In fact it seems to me that these are two different configurations, since `SqlCommand` seems to have its own commandtimeout. I have encountered this kind of problem, and the same problem still occurs when setting the `CommandTimeout` property of `SqlServerStorageOptions`. – Chen Feb 02 '23 at 07:15
  • If that's true, then the question is: **Where** can I increase the `SQLCommand` timeout, if **NOT** in `.UseSqlServerStorage()`? Because as you can see in the Stacktrace, that is all code from `Hangfire`, I'm not instantiating these failing `SQLCommand` objects ;) – roli09 Feb 02 '23 at 07:46