We are using Serilog file sink and Elastic search sink that both use a Serilog async sink to make sure everything is handled in a background thread. The last days we had server-wide issues and all applications slowed down. There was a full user dump taken by the system engineers (with debug diagnostic tool) and upon inspection of that dump, it looks like we have an issue with the Serilog.Sinks.Async.BackgroundWorkerSink
This is one of the ~3000 threads in the clrstack :
OS Thread Id: 0x30b7c
Child SP IP Call Site
00000025AAA8F048 00007ffd91720bb4 [HelperMethodFrame_1OBJ: 00000025aaa8f048] System.Threading.Monitor.ObjWait(Int32, System.Object)
00000025AAA8F170 00007ffce4454638 System.Threading.SemaphoreSlim.WaitUntilCountOrTimeout(Int32, UInt32, System.Threading.CancellationToken) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/SemaphoreSlim.cs @ 462]
00000025AAA8F1C0 00007ffce43f1b4a System.Threading.SemaphoreSlim.Wait(Int32, System.Threading.CancellationToken) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/SemaphoreSlim.cs @ 365]
00000025AAA8F270 00007ffce44b1e27 System.Collections.Concurrent.BlockingCollection`1[[System.__Canon, System.Private.CoreLib]].TryTakeWithNoTimeValidation(System.__Canon ByRef, Int32, System.Threading.CancellationToken, System.Threading.CancellationTokenSource)
00000025AAA8F2F0 00007ffce44b1ce4 System.Collections.Concurrent.BlockingCollection`1+d__68[[System.__Canon, System.Private.CoreLib]].MoveNext()
00000025AAA8F340 00007ffce2ede22d Serilog.Sinks.Async.BackgroundWorkerSink.Pump()
00000025AAA8F390 00007ffce43d6617 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 183]
00000025AAA8F400 00007ffce44124fe System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 2333]
00000025AAA8F700 00007ffd421eaed3 [DebuggerU2MCatchHandlerFrame: 00000025aaa8f700]
It looks like thousands of threads have a SemaphoreSlim.Wait, just waiting on a thread to become available. Is there anything we misconfigured? Or that we can do to optimise ? We believe the issue is most likely something with the file sink...
Things we are wondering about are :
- Is it wise to have "shared:true" for the file sink?
- Is it wise to combine the elastic search sink with the async sink?
- Isn't Serilog supposed to use the Treadpool? Why then do we have ~3000 threads?
This is our (simplified) configuration :
var logCfg =
new LoggerConfiguration()
.Enrich.WithProperty("machine", System.Environment.MachineName)
.WriteTo.Map(keyPropertyName: "$filename", defaultKey: "fallback",
configure: (fileName, wt) =>
wt.Async(c =>
c.File(formatter: formatter
, path: logOptions.AuditPath
, shared: true
, fileSizeLimitBytes: fileSizeLimitBytes ?? 41943040
, rollingInterval: RollingInterval.Day
, rollOnFileSizeLimit: true
)
)
)
.WriteTo.Async(c =>
c.Elasticsearch(new Serilog.Sinks.Elasticsearch.ElasticsearchSinkOptions(new Uri(elasticUrl))
{
IndexFormat = "my-audit-" + DateTime.Now.Year,
ModifyConnectionSettings = x => x.MyAuthentication(elasticCreds[0], elasticCreds[1])
}
)
);
}
If it matters, these are our nuget versions :
- Serilog.Sinks.Async 1.5.0.0
- Serilog.sinks.Elasticsearch 8.4.1
- Serilog.Sinks.File 5.0.0