3

I have documents in an index that look like this:

{
  "foo": null,
  "bars": [
    {
      "baz": "BAZ",
      "qux": null,
      "bears": [
        {
          "fruit": "banana"
        }
      ]
    }
  ]
}

I want to aggregate unique values of .bars[].bears[].fruit with counts for each found value. However, I also only want to count these deep values for documents which match certain conditions on foo, and for values of bars[] that match certain conditions on baz and qux. I also want to aggregate all documents, ignoring whatever the search query happens to be.

The following query does everything I want to do:

{
  "aggs": {
    "global": {
      "global": {},
      "aggs": {
        "notFoo": {
          "filter": {
            "bool": {
              "must_not": [
                {
                  "exists": {
                    "field": "foo"
                  }
                }
              ]
            }
          },
          "aggs": {
            "bars": {
              "nested": {
                "path": "bars"
              },
              "aggs": {
                "notValueN": {
                  "filter": {
                    "bool": {
                      "filter": [
                        {
                          "bool": {
                            "should": [
                              {
                                "terms": {
                                  "bars.baz": [
                                    "value1",
                                    "value2",
                                    "value3"
                                  ]
                                }
                              },
                              {
                                "terms": {
                                  "bars.qux": [
                                    "value4",
                                    "value5",
                                    "value6"
                                  ]
                                }
                              }
                            ],
                            "minimum_should_match": 1
                          }
                        }
                      ]
                    }
                  },
                  "aggs": {
                    "bears": {
                      "nested": {
                        "path": "bars.bears"
                      },
                      "aggs": {
                        "rules": {
                          "terms": {
                            "field": "bars.bears.fruit"
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

This query works, but it feels rather large and onerous. In order to get the result I'm looking for out of the response I have to access .aggregations.global.bars.notValueN.bears.fruit.buckets. Is there a way to flatten this large query? As it stands, this query is very hard to maintain should any additional conditions need to be introduced later.

knpwrs
  • 15,691
  • 12
  • 62
  • 103

1 Answers1

3

The only place where ES supports object key flattening is in the Cluster settings API. Unfortunately, this strategy cannot be used in other parts of the API, including aggregations.

There are a few other tricks worth mentioning, though.

1. First off, there's Aggregation Metadata.

Whoever is tasked with post-processing the heavily nested aggregation results will appreciate knowing the target buckets path. You can provide it through an aggregation metadata clause:

POST your-index/_search
{
  "aggs": {
    "global": {
      "global": {},
      "meta": {
        "accessor_path": "aggs.global.Foo...."  <---
      },
      ...

which will return

{
  "aggregations" : {
    "global" : {
      "meta" : {
        "accessor_path" : "aggs.global.Foo..."  <---
      },
      "Foo" : {

2. Then, there's Response Filtering

If you had multiple (sub)aggregations enclosed in the same request body, you can reduce the response "bulkiness" through the filter_path URI parameter:

POST your-index/_search/template?filter_path=aggregations.global.meta,aggregations.global.*.*.*.*.*.buckets
{
  "aggs": {
    "global": {
      ...

This may or may not really help you because your agg query seems straightforward & without too many subclauses.

3. Finally, let's talk about maintainability

When working with reusable queries, Elasticsearch offers the Search template API. You'd construct a script containing a parametrized mustache template and then provide the parameters at query time.

In your particular use case, I'd propose the following:

  1. Store the mustache template script:
POST _scripts/nested_bars_query
{
  "script": {
    "lang": "mustache",
    "source": """
      {
        "query": {{#toJson}}raw_search_query{{/toJson}},
        "aggs": {
          "global": {
            "global": {},
            "meta": {
              "accessor_path": "{{accessor_path}}"
            },
            "aggs": {
              "{{notXYZ.agg_name}}": {
                "filter": {
                  "bool": {
                    "must_not": [
                      {
                        "exists": {
                          "field": "{{notXYZ.field_name}}"
                        }
                      }
                    ]
                  }
                },
                "aggs": {
                  "bars": {
                    "nested": {
                      "path": "bars"
                    },
                    "aggs": {
                      "{{notValueN.agg_name}}": {
                        "filter": {
                          "bool": {
                            "filter": [
                              {
                                "bool": {
                                  "should":  {{#toJson}}notValueN.raw_should_clauses{{/toJson}},
                                  "minimum_should_match": 1
                                }
                              }
                            ]
                          }
                        },
                        "aggs": {
                          "bears": {
                            "nested": {
                              "path": "bars.bears"
                            },
                            "aggs": {
                              "rules": {
                                "terms": {
                                  "field": "bars.bears.fruit"
                                }
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    """
  }
}
  1. Target the /_search/template/ endpoint with the search template id (nested_bars_query). Also, specify the filter_path and the metadata accessor_path that were discussed above:
POST your-index-name/_search/template?filter_path=aggregations.global.meta,aggregations.global.*.*.*.*.*.buckets
{
  "id": "nested_bars_query",
  "params": {
    "raw_search_query": {
      "match_all": {}
    },
    "accessor_path": "aggs.global.Foo.bars.notValueN.bears.rules.buckets",
    "notXYZ": {
      "agg_name": "Foo",
      "field_name": "foo"
    },
    "notValueN": {
      "agg_name": "notValueN",
      "raw_should_clauses": [
        {
          "terms": {
            "bars.baz": [
              "BAZ",
              "value2",
              "value3"
            ]
          }
        },
        {
          "terms": {
            "bars.qux": [
              "value4",
              "value5",
              "value6"
            ]
          }
        }
      ]
    }
  }
}

You could of course standardize the above by removing the possibility to define custom agg_names etc.

Should you need to introduce additional conditions later, you can modify the raw_should_clauses list inside the params.

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68