7

I am running ELK (Elasticsearch, Logstash, Kibana) in cluster where docker containers are running. Those containers sends logs to Logstash via GELF endpoint.

docker run  --log-driver=gelf --log-opt gelf-address=udp://somehost:12201 -ti    my-app:latest 

And then I process logs in Logstash. Here, I want to collapse multiline messages and merge them into a single event (Java exception in my case). My config is:

input {
    gelf {} 
}
filter{
    multiline {
      pattern => "^%{TIMESTAMP_ISO8601}"
      negate => true
      what => "previous"
      source => "short_message"
      }
}
output {
    stdout { codec => rubydebug }
}

It works perfectly when I process logs from one docker container, but for two or more it does not work, because it collapse messages of both (or more) logs streams.

I would expect, that setting up multilining in input would solve the problem

input {
    gelf {
      filter{
         multiline {
            pattern => "^%{TIMESTAMP_ISO8601}"
            negate => true
            what => "previous"
         }
     }
}

but multilining does not work correctly with this set up (seems because of bug). Any suggestions? Thanks.

I am using: Docker 1.9.1, Logstash 2.1

jiri463
  • 859
  • 1
  • 8
  • 21

1 Answers1

9

We solved it by using the option 'stream_identity' of the multiline filtering.

The stream identity is how the multiline filter determines which stream an event belongs to. This is generally used for differentiating, say, events coming from multiple files in the same file input, or multiple connections coming from a tcp input.

https://www.elastic.co/guide/en/logstash/current/plugins-filters-multiline.html#plugins-filters-multiline-stream_identity

As you are using Gelf you can use the host and container_id to uniquely identify the messages:

filter {

  multiline {
    pattern => "^%{TIMESTAMP_ISO8601}"
    negate => true
    what => "previous"
    source => "short_message"
    stream_identity => "%{host}.%{container_id}"
  }
}
  • Won't this cause performance issues? From my understanding, when the multiline filter is used Logstash drops down to one thread so that messages can be kept in order. – jmreicha Sep 01 '16 at 03:07
  • @jmreicha I think it has indeed an impact! Btw since my answer we did change our way of working for our java services. We use [the logstash logback appender](https://github.com/logstash/logstash-logback-encoder). It's a completely different approach, so it does certainly not answer the original question. I think one should choose the approach based on his situation. If you don't send a lot of logs, the multiline approach could be good enough. It was for us until we wanted some more flexibility. – Toni Van de Voorde Sep 01 '16 at 11:18
  • Thanks for the additional info, the multiline filter is the best approach I have found so far. – jmreicha Sep 01 '16 at 15:30