9

I have a message that flows through several systems, each system logs message entry and exit with a timestamp and a uuid messageId. I'm ingesting all logs through:

filebeat --> logstash --> elastic search --> kibana

As a result I now have these events:

@timestamp                      messageId                               event 
May 19th 2016, 02:55:29.003     00e02f2f-32d5-9509-870a-f80e54dc8775    system1Enter
May 19th 2016, 02:55:29.200     00e02f2f-32d5-9509-870a-f80e54dc8775    system1Exit
May 19th 2016, 02:55:29.205     00e02f2f-32d5-9509-870a-f80e54dc8775    system2Enter
May 19th 2016, 02:55:29.453     00e02f2f-32d5-9509-870a-f80e54dc8775    system2Exit

I would like to produce a report (ideally a stacked bar or column) of time spent in each system:

messageId                               in1:1->2:in2
00e02f2f-32d5-9509-870a-f80e54dc8775    197:5:248

What is the best way to do this? Logstash filters? kibana calculated fields?

RaGe
  • 22,696
  • 11
  • 72
  • 104
  • There's a good example of this described in the logstash [aggregate filter](https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html) – Val May 20 '16 at 19:01
  • @Val, Hmm that example needs the elapsed time to be in the log line - doesn't calculate it from two separate log lines. Maybe I can use that along with the `Elapsed{}` plugin somehow. – RaGe May 20 '16 at 19:41

2 Answers2

19

You can achieve this with the Logstash aggregate filter only, however, you'd have to substantially re-implement what the elapsed filter already does, so that'd be a shame, right?

Let's then use a mix of the Logstash aggregate filter and the elapsed filter. The latter is used to measure the time of each stage and the former is used to aggregate all the timing information into the last event.

Side note: you might want to rethink your timestamp format to make it something more standard for parsing. I've transformed them to ISO 8601 to make it easier to parse, but feel free to roll your own regex.

So I'm starting from the following logs:

2016-05-19T02:55:29.003 00e02f2f-32d5-9509-870a-f80e54dc8775 system1Enter
2016-05-19T02:55:29.200 00e02f2f-32d5-9509-870a-f80e54dc8775 system1Exit
2016-05-19T02:55:29.205 00e02f2f-32d5-9509-870a-f80e54dc8775 system2Enter
2016-05-19T02:55:29.453 00e02f2f-32d5-9509-870a-f80e54dc8775 system2Exit

First I'm using three elapsed filters (one for each stage in1, 1->2 and in2) and then three aggregate filters in order to gather all the timing information. It looks like this:

filter {
  grok {
    match => ["message", "%{TIMESTAMP_ISO8601:timestamp} %{UUID:messageId} %{WORD:event}"]
    add_tag => [ "%{event}" ]
  }
  date {
    match => [ "timestamp", "ISO8601"]
  }
  # Measures the execution time of system1
  elapsed {
    unique_id_field => "messageId"
    start_tag => "system1Enter"
    end_tag => "system1Exit"
    new_event_on_match => true
    add_tag => ["in1"]
  }
  # Measures the execution time of system2
  elapsed {
    unique_id_field => "messageId"
    start_tag => "system2Enter"
    end_tag => "system2Exit"
    new_event_on_match => true
    add_tag => ["in2"]
  }
  # Measures the time between system1 and system2
  elapsed {
    unique_id_field => "messageId"
    start_tag => "system1Exit"
    end_tag => "system2Enter"
    new_event_on_match => true
    add_tag => ["1->2"]
  }
  # Records the execution time of system1
  if "in1" in [tags] and "elapsed" in [tags] {
    aggregate {
      task_id => "%{messageId}"
      code => "map['report'] = [(event['elapsed_time']*1000).to_i]"
      map_action => "create"
    }
  }
  # Records the time between system1 and system2
  if "1->2" in [tags] and "elapsed" in [tags] {
    aggregate {
      task_id => "%{messageId}"
      code => "map['report'] << (event['elapsed_time']*1000).to_i"
      map_action => "update"
    }
  }
  # Records the execution time of system2
  if "in2" in [tags] and "elapsed" in [tags] {
    aggregate {
      task_id => "%{messageId}"
      code => "map['report'] << (event['elapsed_time']*1000).to_i; event['report'] = map['report'].join(':')"
      map_action => "update"
      end_of_task => true
    }
  }
}

After the first two events, you'll get a new event like this, which shows that 197ms have been spent in system1:

{
                 "@timestamp" => "2016-05-21T04:20:51.731Z",
                       "tags" => [ "elapsed", "elapsed_match", "in1" ],
               "elapsed_time" => 0.197,
                  "messageId" => "00e02f2f-32d5-9509-870a-f80e54dc8775",
    "elapsed_timestamp_start" => "2016-05-19T00:55:29.003Z"
}

After the third event, you'll get an event like this, which shows how much time is spent between system1 and system2, i.e. 5ms:

{
                 "@timestamp" => "2016-05-21T04:20:51.734Z",
                       "tags" => [ "elapsed", "elapsed_match", "1->2" ],
               "elapsed_time" => 0.005,
                  "messageId" => "00e02f2f-32d5-9509-870a-f80e54dc8775",
    "elapsed_timestamp_start" => "2016-05-19T00:55:29.200Z"
}

After the fourth event, you'll get a new event like this one, which shows how much time was spent in system2, i.e. 248ms. That event also contains a report field with all the timing information of the message

{
                 "@timestamp" => "2016-05-21T04:20:51.736Z",
                       "tags" => [ "elapsed", "elapsed_match", "in2" ],
               "elapsed_time" => 0.248,
                  "messageId" => "00e02f2f-32d5-9509-870a-f80e54dc8775",
    "elapsed_timestamp_start" => "2016-05-19T00:55:29.205Z"
                     "report" => "197:5:248"
}
Val
  • 207,596
  • 13
  • 358
  • 360
  • Thanks val, I was going down the same rabbit-hole, this helps me get there faster. The thing I noticed about the elapsed filters is that since each of my log lines matches only one of the several elapsed filters, all the non matching ones add a `_grokparsefailure` tag. I added a `tag_on_failure => [ ]` to my elapsed filters. – RaGe May 21 '16 at 14:20
  • Trying to figure out the errors I'm seeing with `to_i`, apparently the input in the map lines is an array instead of a string. `Aggregate exception occurred. Error: undefined method `to_i' for [197.0]:Array` – RaGe May 23 '16 at 19:40
  • Sorry, my bad, I had a typo in the code for the first aggregate (square bracket at wrong place), I've updated the answer, please try again. – Val May 23 '16 at 19:47
  • Ah, that's how you create a map in rb. I corrected your typo by getting rid of the square braces entirely, and was having issues with the join. All good now. Thank you for your help! – RaGe May 23 '16 at 20:11
  • Do you know if there is a way to change the date of the events that elapsed or aggregate create? – RaGe May 23 '16 at 20:19
  • making that a new question. http://stackoverflow.com/questions/37400178/change-date-on-event-created-by-elapsed-or-aggregate-filters – RaGe May 23 '16 at 20:42
  • I've answered that question ;-) – Val May 24 '16 at 02:56
2

I had to make some tweaks for this work in logstash 5.4, here is the revised code.

filter {
  grok {
    match => ["message", "%{TIMESTAMP_ISO8601:timestamp} %{UUID:messageId} %{WORD:event}"]
    add_tag => [ "%{event}" ]
  }
  date {
    match => [ "timestamp", "ISO8601"]
  }
  # Measures the execution time of system1
  elapsed {
    unique_id_field => "messageId"
    start_tag => "system1Enter"
    end_tag => "system1Exit"
    new_event_on_match => true
    add_tag => ["in1"]
  }
  # Measures the time between system1 and system2
  elapsed {
    unique_id_field => "messageId"
    start_tag => "system1Exit"
    end_tag => "system2Enter"
    new_event_on_match => true
    add_tag => ["1->2"]
  }
  # Measures the execution time of system2
  elapsed {
    unique_id_field => "messageId"
    start_tag => "system2Enter"
    end_tag => "system2Exit"
    new_event_on_match => true
    add_tag => ["in2"]
  }
  # Records the execution time of system1
  if "in1" in [tags] and "elapsed" in [tags] {
    aggregate {
      task_id => "%{messageId}"
      code => "map['report'] = (event.get('elapsed_time')*1000).to_i"
      map_action => "create"
    }
  } 
  # Records the time between system1 and system2
  if "1->2" in [tags] and "elapsed" in [tags] {
    aggregate {
      task_id => "%{messageId}"
      code => "map['report'] << (event.get('elapsed_time')*1000).to_i"
      map_action => "update"
    }
  }
  # Records the execution time of system2
  if "in2" in [tags] and "elapsed" in [tags] {
    aggregate {
      task_id => "%{messageId}"
      code => "map['report'] << (event.get('elapsed_time')*1000).to_i"
      map_action => "update"
      end_of_task => true
    }
  }
}
Mr-H
  • 21
  • 1
  • Elapsed and aggregate filters will not work properly when multiple logstash workers are used. In that case, how do we solve the same problem? Thanks. – Nikhil Utane Nov 22 '17 at 03:57
  • You force it to use a single working. This could be a bottleneck but other items don't have to follow the same path – Mr-H Dec 12 '17 at 15:12