Filter Elasticsearch Aggregation by Bucket Key Value

Question

I have an Elasticsearch index of documents in which there is a field that contains a list of URLs. Aggregating on this field gives me the count of unique URLs, as expected.

GET models*/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "links": {
      "terms": {
        "field": "links.keyword",
        "size": 10
      }
    }
  }
}

I then want to filter out the buckets whose keys do not contain a certain string. I've tried doing so with the Bucket Selector Aggregation.

This attempt:

GET models*/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "links": {
      "terms": {
        "field": "links.keyword",
        "size": 10
      }
    },
    "links_key_filter": {
      "bucket_selector": {
        "buckets_path": {
          "key": "links"
        },
        "script": "!key.contains('foo')"
      }
    }
  }
}

Fails with:

Invalid pipeline aggregation named [links_key_filter] of type [bucket_selector]. Only sibling pipeline aggregations are allowed at the top level

Putting the bucket selector inside the links aggregation, like so:

GET models*/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "links": {
      "terms": {
        "field": "links.keyword",
        "size": 10
      },
      "bucket_selector": {
        "buckets_path": {
          "key": "links"
        },
        "script": "!key.contains('foo')"
      }
    }
  }
}

fails with:

Found two aggregation type definitions in [links]: [terms] and [bucket_selector]

I'm going to keep tinkering but am a bit stuck at the moment :(

I think in the second case you're missing the `aggs` section in which the `links_key_filter ` aggregation should go — Val, Nov 23 '17 at 15:02

Joe - GMapsBook.com · Answer 1 · 2021-03-18T16:54:11.763

You won't be able to use the bucket_selector because its bucket_path

must reference either a number value or a single value numeric metric aggregation [source]

and what a terms aggregation produces is denoted as StringTerms — and that simply won't work, regardless of whether you force a placeholder multibucket aggregation or not.

Having said that, each `terms` aggregation supports the `exclude` filter.

Assuming that your links are arrays of keywords:

POST models/_doc/1
{
  "links": [
    "google.com",
    "wikipedia.org"
  ]
}

POST models/_doc/2
{
  "links": [
    "reddit.com",
    "google.com"
  ]
}

and you'd like to group everything except reddit, you can use the following regex:

POST models*/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "links": {
      "terms": {
        "field": "links.keyword",
        "exclude": ".*reddit.*",    <-- 
        "size": 10
      }
    }
  }
}

BTW, There are some non-trivial implications arising from the usage of such regexes, esp. when you imagine a case-sensitive scenario in which you'd need a query-time-generated regex — as discussed in How to correctly query inside of terms aggregate values in elasticsearch, using include and regex?

score -2 · Answer 2 · answered Sep 14 '18 at 11:17

GET models*/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "links": {
      "terms": {
        "field": "links.keyword",
        "size": 10
      }

    },
      "bucket_selector": {
        "buckets_path": {
          "key": "links"
        },
        "script": "!key.contains('foo')"
      }
  }
}

Your selector should come a level up, it should be directly in the aggs and parallel to your selector group. I am not sure about the key filtering

score -2 · Answer 3 · answered Jun 14 '19 at 07:39

You can use "_key" to get keys:

GET models*/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "links": {
      "terms": {
        "field": "links.keyword",
        "size": 10
      },
      "bucket_selector": {
        "buckets_path": {
          "key": "_key"
        },
        "script": "!params.key.contains('foo')"
      }
    }
  }
}

Filter Elasticsearch Aggregation by Bucket Key Value

3 Answers3

Having said that, each `terms` aggregation supports the `exclude` filter.

Linked

Filter Elasticsearch Aggregation by Bucket Key Value

3 Answers3

Having said that, each terms aggregation supports the exclude filter.

Linked

Having said that, each `terms` aggregation supports the `exclude` filter.