3

I have some server logs dumped into elasticsearch. The logs contain entries like 'action_id':'AU11nP1mYXS3pt6INMtU','action':'start','time':'March 31st 2015, 19:42:07.121' and 'action_id':'AU11nP1mYXS3pt6INMtU','action':'complete','time':'March 31st 2015, 23:06:00.271'. Identical action_id refers to a single action and I'm interested in how long it took to complete an action.

I don't really know the elasticsearch way of framing my question but I'll try my best: how to make an aggregation on 'action_id' based upon the custom metric defined by the time-span it took to go from 'action':'start' to 'action':'complete'?

I'm using kibana for visualization if that helps.

lingxiao
  • 1,214
  • 17
  • 33

2 Answers2

0

I looked at the example given for scripted metric aggregation and modified it for this problem:

{
   "aggs": {
      "actions": {
         "terms": {
            "field": "action_id"
         },
         "aggs": {
            "duration": {
               "scripted_metric": {
                  "init_script": "_agg['delta'] = 0",
                  "map_script": "if (doc['action'].value == \"complete\"){ _agg.delta += doc['time'].value } else {_agg.delta -= doc['time'].value}",
                  "combine_script": "return _agg.delta",
                  "reduce_script": "duration = 0; for (d in _aggs) { duration += d }; return duration"
               }
            }
         }
      }
   }
}

First it creates buckets for each action_id with terms aggregation.

Then for each bucket it calculates a scripted metric.

On map step it takes 'complete' timestamps as positive values and others (i.e. 'start' ones) as negative for each shard. Then on combine step it just returns them. And on reduce step it sums durations for an action over all the shards (as 'start' and 'complete' events could be on different shards) to get actual duration.

I'm not sure about the performance of this aggregation but you can try it out on your dataset. And please note that it is marked as experimental functionality yet.

tiurin
  • 838
  • 9
  • 15
  • Do you know if it is possible to do the same with scripted fields (or any other option) in kibana 4? – Guido Jun 05 '15 at 08:44
  • this might be correct but i couldn't get it to work especially not with kibana but it seems easy enough to achieve with logstash and reindexing. i don't have a problem with reindexing it's kind of inevitable working with elasticsearch anyway... – lingxiao Jun 24 '15 at 06:57
0

It looks like elasticsearch is not designed to calculate time duration directly. It seems like elasticsearch uses logstash to perform such tasks.

https://www.elastic.co/guide/en/logstash/current/plugins-filters-elasticsearch.html

if [action] == "complete" {
   elasticsearch {
      hosts => ["es-server"]
      query => "action:start AND action_id:%{[action_id]}"
      fields => ["time", "started"]
   }

  date {
     match => ["[started]", "ISO8601"]
     target => "[started]"
  }

  ruby {
     code => "event['duration_hrs'] = (event['@timestamp'] - event['started']) / 3600 rescue nil"   
  }
}
lingxiao
  • 1,214
  • 17
  • 33