0

I am very new to advanced searching with Elasticsearch, and mostly use Kibana.

In Kibana, I can do a simple search like this:

type:apache_access

That turns into this request:

{
  "version": true,
  "size": 500,
  "sort": [
    {
      "@timestamp": {
        "order": "desc",
        "unmapped_type": "boolean"
      }
    }
  ],
  "_source": {
    "excludes": []
  },
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "10m",
        "time_zone": "America/Los_Angeles",
        "min_doc_count": 1
      }
    }
  },
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    }
  ],
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "@timestamp": {
              "gte": 1613652915536,
              "lte": 1613696115536,
              "format": "epoch_millis"
            }
          }
        }
      ],
      "filter": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "type": "apache_access"
                }
              }
            ],
            "minimum_should_match": 1
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  },
  "highlight": {
    "pre_tags": [
      "@kibana-highlighted-field@"
    ],
    "post_tags": [
      "@/kibana-highlighted-field@"
    ],
    "fields": {
      "*": {}
    },
    "fragment_size": 2147483647
  }
}

What I would like to do is the equivalent of a MySQL GROUP BY on the field clientip, (or maybe clientip.keyword??) My goal is to see which IP addresses show up the most in the access logs in a given timeframe.

Kibana does this for me in the UI, with its "Top 5 values in 500 / 500 records" feature, but that's only for the top 500 values. I want the aggregate of the entire time period.

Tyler V.
  • 2,471
  • 21
  • 44

1 Answers1

1

When you strip this query to the bare minimum and add a terms aggregation on the top 100 IPs, you'll end up with:

{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": {
              "gte": 1613652915536,
              "lte": 1613696115536,
              "format": "epoch_millis"
            }
          }
        },
        {
          "match": {
            "type": "apache_access"
          }
        }
      ]
    }
  },
  "aggs": {
    "by_ip": {
      "terms": {
        "field": "clientip.keyword",
        "size": 100
      }
    }
  }
}
Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • This is great! However, I still need to retain some form of date limitation (the clientip field was recently corrected,) and it seems to mix our Dev/Stage/Prod data which the "Discover" tab in Kibana allows us to limit through an Index pattern. Would you mind showing me how to add both of those to this query? – Tyler V. Feb 19 '21 at 18:09
  • Gotcha. Can you tell me what field covers the Dev/Stage/Prod flag and which date range you'd need to restrict? Feel free to use sql and I'll convert it to elasticsearch DSL for ya. – Joe - GMapsBook.com Feb 19 '21 at 19:06
  • For Dev/Stage/Prod, I'm not sure how to include that in the query. In Kibana, the UI gives me a dropdown to select from, and even when Kibana shows you the query it wrote, the environment filter isn't in it. This part may need to be a separate investigation on my part. For the date part, I simply need to limit the aggregation to rows created after a certain time: @timestamp gte some-datetime. – Tyler V. Feb 19 '21 at 23:35
  • Makes sense. I updated by answer with the range filter -- you can add as many filters as you like into the `filter` array. – Joe - GMapsBook.com Feb 20 '21 at 08:54