Application logging with ELK stack

Question

Using NLog with Elasticsearch target to forward logs to AWS Elasticsearch as a Service cluster for visualisations in Kibana.

This works fine but I am concerned about using this in production due to ES cluster availability and the impact a cluster failover has, when the logs are sent using the elasticsearch-net client via HTTP.

I am considering using a different target for NLog that sends the logs to a more reliable destination (File, S3 ?) and then having something else (Logstash, AWS Lambda) pick them up and sending them to ES, this way minimising risks on the application itself.

Would like to hear your thoughts

UPDATE

Main concern is app availability and to prevent missing logs secondary target is used.

Using latest NLog and throwExceptions is set to false and not using async targets at this point but considering this as we have a lot of async code.

To give a bit more context the "app" is a set of APIs (WebAPI and WCF) which get 10 - 15K RPM.

Scenario

Request comes in and ES cluster is unavailable.

Case 1 - NLog without async target

<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.nlog-project.org/schemas/NLog.xsd NLog.xsd"
        autoReload="true"
        throwExceptions="false"
        internalLogLevel="Off"
        internalLogFile="c:\temp\nlog-internal.log">

    <targets>
      <target name="elastic"
              xsi:type="BufferingWrapper"
              flushTimeout="5000">
        <target xsi:type="ElasticSearch"
                layout="${logger} | ${threadid} | ${message}"
                index="logstash-${date:format=yyyy.MM.dd}"
                includeAllProperties="true"
                uri="...">

          <field name="user"
                 layout="${windows-identity:userName=True:domain=False}"/>
          <field name="host"
                 layout="${machinename}"/>
          <field name="number"
                 layout="1"
                 layoutType="System.Int32"/>

        </target>
      </target>
    </targets>
    <rules>
      <logger name="*"
              minlevel="Debug"
              writeTo="elastic" />
    </rules>
  </nlog>

Q:

what happens with the main thread when target can't be reached?

Case 2 - NLog with async target

Using async wrapper for elasticsearch target with queueLimit="10000" batchSize="100"

Q:

is another thread[B] created ?
will subsequent requests reuse thread [B] and queue the logging requests?
what happens when the queueLimit is reached?
will additional threads [B1 ... Bn] be started? (this will flood connection pool)

Julian · Accepted Answer · 2018-10-07T18:19:25.090

Good question.

There is nothing to worry about, but correct configuration of NLog is important.

Not sure what should be reliable, running the program or not losing a log message, so for those cases:

If you are afraid if you lose some log messages
- Write to multiple targets (from NLog), e.g. File and Elasticsearch.
- Optional, use a fallbackgroupwrapper (in case of an error when writing to the target)
- If async is enabled, check the overflow/queue settings - discard is enabled by default (to protect from CPU or memory overload)
If you afraid that logging could break your application:
- Use the latest stable version of NLog
- Don't enable throwExceptions (disabled by default)
- If you enable async, the errors are written to the target in another thread, so it could not break your app.
- Also when using async, check the overflow and queue settings

Update

Case 1,

what happens with the main thread when target can't be reached?

Nothing. The main queues the messages in a buffer. Another (Timer) thread is processing those messages. If that will fail, and throwException is not enabled, only errors will be written to the internalLog (when enabled). All exceptions will be caught. You will lose the message when writing to the target fails.

Case 2,

is another thread[B] created ?

One Timer will be created. This will create a thread for processing the message.

will subsequent requests reuse thread [B] and queue the logging requests?

Yes, but no guarantee it will the same thread. The timer will create a thread from the pool. NB: only one thread will be alive concurrently.

what happens when the queueLimit is reached?

Depends of your configuration. By default it will discard by default as stated above. See check the overflow/queue settings. This is the safest option in terms of memory and CPU. You could choose to discard, block (stops the main thread), or grown the queue (by aware of memory usage).

will additional threads [B1 ... Bn] be started? (this will flood connection pool)

No. 1 Timer, 1 threadpool. For details check the MSDN page for Timer, or the reference source.

Didn't consider the throwExceptions flag. Thanks for pointing that out! I've updated the question, can you please take another look ? — thedev, Nov 28 '16 at 11:09

Application logging with ELK stack

1 Answers1