0

Let's call my root level foo and my child level events. I want to aggregate on the events level but with a filter that EITHER the event has color "orange" OR the parent foo has customerId "35".

So, I want to have a filter aggregation that's inside a nested aggregation. In this filter's query clause, I have one child that refers to a field on foo and the other refers to a field on events. However, that first child has no way to actually reference the parent like that! I can't use a reverse_nested aggregation because I can't put one of those as a child of a compound query, and I can't filter before nesting because I'd lose the OR semantics that way. How do I reference the field on foo?

Concrete example if it helps. Mapping:

{
  "foo": {
    "properties": {
      "customer_id": { "type": "long" },
      "events": {
        "type": "nested",
        "properties": {
          "color": { "type": "keyword" },
          "coord_y": { "type": "double" }
        }
      }
    }
  }
}

(update for clarity: that's an index named foo with the root mapping named foo)

The query I want to be able to make:

{
  "aggs": {
    "OP0_nest": {
      "nested": { "path": "events" },
      "aggs": {
        "OP0_custom_filter": {
          "filter": {
            "bool": {
              "should": [
                { "term": { "events.color": "orange" } },
                { "term": { "customer_id": 35 } }
              ]
            }
          },
          "aggs": {
            "OP0_op": {
              "avg": { "field": "events.coord_y" }
            }
          }
        }
      }
    }
  }
}

Of course, this does not work, because the child of the should clause containing customer_id does not work. That term query is always false because customer_id can't be accessed inside the nested aggregation.

Thanks in advance!

K. M
  • 867
  • 1
  • 8
  • 17

1 Answers1

3

Since the fields you want to apply filter on are at different levels you need to make query for each level separately and place them in should clause of bool query which becomes the filter for our filter aggregation. In this aggregation we then add a nested aggregation to get the avg of coord_y.

The aggregation will be (UPDATED: since foo is index name removed foo from field names):

{
  "aggs": {
    "OP0_custom_filter": {
      "filter": {
        "bool": {
          "should": [
            {
              "term": {
                "customer_id": 35
              }
            },
            {
              "nested": {
                "path": "events",
                "query": {
                  "term": {
                    "events.color": "orange"
                  }
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "OP0_op": {
          "nested": {
            "path": "events"
          },
          "aggs": {
            "OP0_op_avg": {
              "avg": {
                "field": "events.coord_y"
              }
            }
          }
        }
      }
    }
  }
}
Nishant
  • 7,504
  • 1
  • 21
  • 34
  • right, but now you're filtering `foo` rather than filtering `events`. so we'll still be averaging the coord_y for events that are neither orange nor belonging to customer with ID 35 with this query, right? – K. M Dec 14 '18 at 05:53
  • Okay so I've tested this and my theory was right. What this query is doing is saying, filter `foo` down to a bucket of `foo` docs that either are for customer ID 35 or have any orange child `events` doc. But, the problem with that is, once I nest to the `events` level that filtering isn't respected anymore. The filtering needs to occur after I've nested to the `events` level but I specifically CANT filter anymore once I've nested to the `events` level. – K. M Dec 14 '18 at 06:33
  • Also, have I made a mistake in my example? The `foo` is just the name of the index, `foo` itself is not nested within anything. – K. M Dec 14 '18 at 06:33
  • *once I nest to the events level that filtering isn't respected anymore. The filtering needs to occur after I've nested to the events level but I specifically CANT filter anymore once I've nested to the events level.* How does this differ in expected result? – Nishant Dec 15 '18 at 05:18
  • It differs in that my expected result is that I only get events that either have color = orange, or whose parent foo has customerid = 35. Your query will still aggregate events that have neither color = orange nor whose parent foo has customerid = 35. It aggregates events that either have parent foo customer = 35 OR parent foo with ANY child events with color = orange. – K. M Dec 15 '18 at 08:34
  • It's `should` not `must_not`. How come my query aggregate neither - nor I wonder. – Nishant Dec 15 '18 at 10:01