22

I'm trying to filter buckets for nested aggregations.

Mapping:

{
  "dev.directory.3" : {
    "mappings" : {
      "profile" : {
        "properties" : {
          "events" : {
            "type" : "nested",
            "properties" : {
              "id" : {
                "type" : "integer"
              },
              "name" : {
                "type" : "string",
                "index" : "not_analyzed"
              },
            }
          },
          "title" : {
            "type" : "string"
          }
        }
      }
    }
  }
}

Index data:

"hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "dev.directory.3",
      "_type" : "profile",
      "_id" : "1",
      "_score" : 1.0,
      "_source" : {
        "title" : "Project manager",
        "events" : [ 
          {
            "id" : 1,
            "name" : "Event A",
          }, 
          {
            "id" : 2,
            "name" : "Event B",
          },
          {
            "id" : 3,
            "name" : "Event C",
          },
          {
            "id" : 4,
            "name" : "Event D",
          } 
        ],
      }
    }
  ]
}

I'm using this query and aggregation definition

{
    "query": {
        "nested": {
            "path": "events",
            "query": {
                "bool": {
                    "filter": [{
                        "terms": {
                            "events.id": [1, 2]
                        }
                    }]
                }
            },
            "inner_hits": {}
        }
    },
    "aggs": {
        "events.name12": {
            "filter": {},
            "aggs": {
                "inner": {
                    "nested": {
                        "path": "events"
                    },
                    "aggs": {
                        "events.name": {
                            "terms": {
                                "field": "events.name"
                            }
                        },
                        "events.name_count": {
                            "cardinality": {
                                "field": "events.name"
                            }
                        }
                    }
                }
            }
        }
    },
    "size": 20,
    "_source": ["email", "company_name", "events"]
}

What I am getting is from aggregation result is:

"aggregations": {
    "events.name12": {
      "doc_count": 2,
      "filtered": {
        "doc_count": 4,
        "events.name": {
          "buckets": [
            {
              "key": "Event A",
              "doc_count": 1
            },
            {
              "key": "Event B",
              "doc_count": 1
            },
            {
              "key": "Event C",
              "doc_count": 1
            },
            {
              "key": "Event D",
              "doc_count": 1
            }
          ]
        },
        "events.name_count": {
          "value": 4
        }
      }
    }

I struggling with filtering buckets list only to provided event ids, so the result should be like:

"aggregations": {
    "events.name12": {
      "doc_count": 2,
      "filtered": {
        "doc_count": 2,
        "events.name": {
          "buckets": [
            {
              "key": "Event A",
              "doc_count": 1
            },
            {
              "key": "Event B",
              "doc_count": 1
            }
          ]
        },
        "events.name_count": {
          "value": 2
        }
      }
    }
Rafał Łyczkowski
  • 995
  • 2
  • 11
  • 27

1 Answers1

37

You're almost there, you simply need to add the filter on events.id in your aggregation as well, like this:

{
  "query": {
    "nested": {
      "path": "events",
      "query": {
        "bool": {
          "filter": [
            {
              "terms": {
                "events.id": [
                  1,
                  2
                ]
              }
            }
          ]
        }
      },
      "inner_hits": {}
    }
  },
  "aggs": {
    "events.name12": {
      "nested": {
        "path": "events"
      },
      "aggs": {
        "inner": {
          "filter": {
            "terms": {
              "events.id": [
                1,
                2
              ]
            }
          },
          "aggs": {
            "events.name": {
              "terms": {
                "field": "events.name"
              }
            },
            "events.name_count": {
              "cardinality": {
                "field": "events.name"
              }
            }
          }
        }
      }
    }
  },
  "size": 20,
  "_source": [
    "email",
    "company_name",
    "events"
  ]
}

The reason is that your query will correctly select all documents for which there are nested events with the specified event IDs, however, your aggregation will then work on all nested events from all selected documents. So you need to filter out all nested events from those documents that don't have the right IDs in the aggregation as well.

Val
  • 207,596
  • 13
  • 358
  • 360
  • Thanks for answer! I've tried this one but it gets me no aggregation result, no buckets. I was surprised as well for that. Maybe some true data: – Rafał Łyczkowski Dec 21 '16 at 16:33
  • 2
    Sorry my bad, the filter needs to be inside the nested aggregation, otherwise it won't work... I've fixed it. Please try again. – Val Dec 21 '16 at 16:35
  • 2
    Life saver! You'll get my rep! Thanks you! Anyways, that was tricky part to move nested upper aggregation and filter downwards... – Rafał Łyczkowski Dec 21 '16 at 16:39
  • 2
    Yes, just remember that the nested aggregation just sets the context so that you can work on it, hence the filter needs to go inside it. – Val Dec 21 '16 at 16:43
  • inside agg `filter`, should be `term` instead of `terms` – Rashidul Islam Sep 29 '19 at 13:18
  • @RashidulIslam nope since there are two terms, you need `terms` and not `term` – Val Sep 29 '19 at 14:35
  • @Val One doubt, why do you specify `events.id` in array `[]` in `filter aggregation` . – Gibbs Jul 10 '20 at 09:39
  • @Gibbs because the terms query requires it – Val Jul 10 '20 at 09:50
  • @Val My bad. Thanks. In `term query` we can specify `events.id:2` since it `terms` it has to check for multiple. Hence it required `[]`Am I correct? – Gibbs Jul 10 '20 at 09:53
  • @Val How can I convert this query in bodyBuilder.js syntax? any idea? – H_H Nov 11 '21 at 09:36