0

I am indexing Tomcat access-log data into Elasticsearch (1.7.3). The documents that I deal with have the concept of duration, represented as end time and duration in millisec (start time can be calculated, though I can store it as well, if it helps solve my problem). For example:

{
  ztime: "10-17-2015T04:05:00.000+02:00",
  duration: 4500,
  thred: "http-nio-8080-exec-14"
},
{
  ztime: "10-17-2015T04:07:42.227+02:00",
  duration: 3100,
  thred: "http-nio-8080-exec-25"
}

My goal is to produce a histogram where I show for each second how many threads existed.

I thought of using a date_histogram that will aggregate my docs into 1 sec buckets.

GET /mindex/mtype/_search?search_type=count
{
  "aggs": {
      "threads_per_hr": {
        "date_histogram": {
          "field": "ztime",
          "interval": "1s",
          "min_doc_count": 1
        },
       "aggs": {
          "per_hr_threads": {
             "cardinality": {
                "field": "thread"
             }
          }
       }
      }
  }
}

however, thus each thread will be bucketized only once.

What I need is for each doc to be bucketized into several buckets. For example, I will need the first document to be bucketized into the 04:05:00.000, 04:05:01.000, 04:05:02.000, 04:05:03.000 buckets.

What kind of query (Java API and/or REST API) would help me achieve this goal?

1 Answers1

0

You need to use cardinality aggregation here. It gives the number of unique values for the field.

GET /{index}/{type}/_search?search_type=count
{
  "aggs": {
      "threads_per_hr": {
        "date_histogram": {
          "field": "ztime",
          "interval": "1s",
          "min_doc_count": 0
        },
       "aggs": {
          "per_hr_threads": {
             "cardinality": {
                "field": "thread"
             }
          }
       }
      }
  }
}
Vineeth Mohan
  • 18,633
  • 8
  • 63
  • 77
  • cardinality will also yield the value 1 for each bucket, because each call has only one document in ES.Had it been SQL, you would have joined the table with itself to have X rows for a request that took X seconds, then you would have counted the number of rows. I am looking for a similar calculation in ES. – Tsiyona Dershowitz Nov 12 '15 at 19:31
  • Are you looking for value_count - https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-valuecount-aggregation.html – Vineeth Mohan Nov 13 '15 at 03:56