1

We are using Serilog file sink and Elastic search sink that both use a Serilog async sink to make sure everything is handled in a background thread. The last days we had server-wide issues and all applications slowed down. There was a full user dump taken by the system engineers (with debug diagnostic tool) and upon inspection of that dump, it looks like we have an issue with the Serilog.Sinks.Async.BackgroundWorkerSink

This is one of the ~3000 threads in the clrstack :

OS Thread Id: 0x30b7c
        Child SP               IP Call Site
00000025AAA8F048 00007ffd91720bb4 [HelperMethodFrame_1OBJ: 00000025aaa8f048] System.Threading.Monitor.ObjWait(Int32, System.Object)
00000025AAA8F170 00007ffce4454638 System.Threading.SemaphoreSlim.WaitUntilCountOrTimeout(Int32, UInt32, System.Threading.CancellationToken) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/SemaphoreSlim.cs @ 462]
00000025AAA8F1C0 00007ffce43f1b4a System.Threading.SemaphoreSlim.Wait(Int32, System.Threading.CancellationToken) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/SemaphoreSlim.cs @ 365]
00000025AAA8F270 00007ffce44b1e27 System.Collections.Concurrent.BlockingCollection`1[[System.__Canon, System.Private.CoreLib]].TryTakeWithNoTimeValidation(System.__Canon ByRef, Int32, System.Threading.CancellationToken, System.Threading.CancellationTokenSource)
00000025AAA8F2F0 00007ffce44b1ce4 System.Collections.Concurrent.BlockingCollection`1+d__68[[System.__Canon, System.Private.CoreLib]].MoveNext()
00000025AAA8F340 00007ffce2ede22d Serilog.Sinks.Async.BackgroundWorkerSink.Pump()
00000025AAA8F390 00007ffce43d6617 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 183]
00000025AAA8F400 00007ffce44124fe System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 2333]
00000025AAA8F700 00007ffd421eaed3 [DebuggerU2MCatchHandlerFrame: 00000025aaa8f700]

It looks like thousands of threads have a SemaphoreSlim.Wait, just waiting on a thread to become available. Is there anything we misconfigured? Or that we can do to optimise ? We believe the issue is most likely something with the file sink...

Things we are wondering about are :

  • Is it wise to have "shared:true" for the file sink?
  • Is it wise to combine the elastic search sink with the async sink?
  • Isn't Serilog supposed to use the Treadpool? Why then do we have ~3000 threads?

This is our (simplified) configuration :

        var logCfg =
            new LoggerConfiguration()
            .Enrich.WithProperty("machine", System.Environment.MachineName)
            .WriteTo.Map(keyPropertyName: "$filename", defaultKey: "fallback",
                configure: (fileName, wt) =>
                     wt.Async(c =>
                         c.File(formatter: formatter
                         , path: logOptions.AuditPath
                         , shared: true
                         , fileSizeLimitBytes: fileSizeLimitBytes ?? 41943040
                         , rollingInterval: RollingInterval.Day
                         , rollOnFileSizeLimit: true
                         )
                )
            )
            .WriteTo.Async(c =>
                       c.Elasticsearch(new Serilog.Sinks.Elasticsearch.ElasticsearchSinkOptions(new Uri(elasticUrl))
                       {
                           IndexFormat = "my-audit-" + DateTime.Now.Year,
                           ModifyConnectionSettings = x => x.MyAuthentication(elasticCreds[0], elasticCreds[1])
                       }
                   )

            );
        }

If it matters, these are our nuget versions :

  • Serilog.Sinks.Async 1.5.0.0
  • Serilog.sinks.Elasticsearch 8.4.1
  • Serilog.Sinks.File 5.0.0
AardVark71
  • 3,928
  • 2
  • 30
  • 50

1 Answers1

2

Most important bit is that you want the Async Sink around the Map, not the inverse as you have shown.

The current impl of the Async Sink uses GetConsumingEnumerable which consumes a thread. That works great for perf but is not optimal of course (think there is a GH Issue about switching it to Channels, which would alleviate that but might make its perf worse). Regardless of that, you only want one of them.

The high level on this is:

  • Capturing values etc happens on the actual execution thread; that produces a LogEvent structure. That can't / doesn't need to be parallelised.
  • The sink wireup then defines routing of LogEvents from across all logging threads and dispatches them. If you use Async, it stuffs them into a queue for the consumer to grab as and when it can instead of doing it immediately. (Go read it, it's really neat and short and will render the rest obvious)

So having two Async sinks like you have is costing more - you want an outer Async, and the do the Elastic followed by the File (normally something like Elastic will do internal buffering and hence will take the data quickly)

Ruben Bartelink
  • 59,778
  • 26
  • 187
  • 249