0

I'm trying to set up a search query that should composite aggregate a collection by a multi-level nested field and give me some sub-aggregation metrics from this collection. I was able to fetch the composite aggregation with its buckets as expected but the sub-aggregation metrics come with 0 for all buckets. I'm not sure if I am failing to correctly point out what fields the sub-aggregation should consider or if it should be placed inside a different part of the query.

My collection looks similar to the following:

{
  id: '32ead132eq13w21',
  statistics: {
    clicks: 123,
    views: 456
  },
  categories: [{ //nested type
    name: 'color',
    tags: [{ //nested type
      slug: 'blue'
    },{
      slug: 'red'
    }]
  }]
}

Bellow you can find what I have tried so far. All buckets come with clicks sum as 0 even though all documents have a set clicks value.

GET /acounts-123321/_search
{
  "size": 0,
  "aggs": {
    "nested_categories": {
     "nested": {
        "path": "categories"
     },
     "aggs": {
           "nested_tags": {
             "nested": {
                "path": "categories.tags"
              },
              "aggs": {
                "group": {
                  "composite": {
                     "size": 100,
                     "sources": [
                       { "slug": { "terms" : { "field": "categories.tags.slug"} }}
                     ]
                   },
                   "aggregations": {
                     "clicks": {
                       "sum": {
                         "field": "statistics.clicks"
                       }
                     }
                   }
                }
              }
            }
          }
       }
  }
}

The response body I have so far:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1304,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "nested_categories" : {
      "doc_count" : 1486,
      "nested_tags" : {
        "doc_count" : 1486,
        "group" : {
          "buckets" : [
            {
              "key" : {
                "slug" : "red"
              },
              "doc_count" : 268,
              "clicks" : {
                "value" : 0.0
              }
            }, {
              "key" : {
                "slug" : "blue"
              },
              "doc_count" : 122,
              "clicks" : {
                "value" : 0.0
            },
            .....
          ]
        }
      }
    }
  }
}
adolfosrs
  • 9,286
  • 5
  • 39
  • 67

1 Answers1

3

In order for this to work, all sources in the composite aggregation would need to be under the same nested context.

I've answered something similar a while ago. The asker needed to put the nested values onto the top level. You have the opposite challenge -- given that the stats.clicks field is on the top level, you'd need to duplicate it across each entry of the categories.tags which, I suspect, won't be feasible because you're likely updating these stats every now and then…

If you're OK with skipping the composite approach and using the terms agg without it, you could make the summation work by jumping back to the top level thru reverse_nested:

{
  "size": 0,
  "aggs": {
    "nested_tags": {
      "nested": {
        "path": "categories.tags"
      },
      "aggs": {
        "by_slug": {
          "terms": {
            "field": "categories.tags.slug",
            "size": 100
          },
          "aggs": {
            "back_to_parent": {
              "reverse_nested": {},
              "aggs": {
                "clicks": {
                  "sum": {
                    "field": "statistics.clicks"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

This'll work just as fine but won't offer pagination.


Clarification

If you needed a color filter, you could do:

{
  "size": 0,
  "aggs": {
    "categories_parent": {
      "nested": {
        "path": "categories"
      },
      "aggs": {
        "filtered_by_color": {
          "filter": {
            "term": {
              "categories.name": "color"
            }
          },
          "aggs": {
            "nested_tags": {
              "nested": {
                "path": "categories.tags"
              },
              "aggs": {
                "by_slug": {
                  "terms": {
                    "field": "categories.tags.slug",
                    "size": 100
                  },
                  "aggs": {
                    "back_to_parent": {
                      "reverse_nested": {},
                      "aggs": {
                        "clicks": {
                          "sum": {
                            "field": "statistics.clicks"
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • Thank u so much for your answer. That should probably work out for me. It still not clear to me how would I be able to do a filter aggregation in such situation tough. What if I want to filter aggregate only tags inside category with name `color`? On what level should I add such aggregation? – adolfosrs Feb 07 '21 at 20:37
  • No problem! I've added such a filter example to my answer. Hope it helps. Hey -- such a highly nested syntax (no pun intended) may appear overwhelming at first, esp. for folks coming from a firebase background. In my Elasticsearch Handbook I try to bring clarity into topics like this and I think it'd bring you value. [Let me know what other topics besides `nested` fields interest you](https://jozefsorocin.typeform.com/to/XeQRxdwV) and I'll let you know when the handbook becomes available! – Joe - GMapsBook.com Feb 07 '21 at 21:43
  • 1
    Hello again @joe-sorocin. Agreed, I'm just starting on ES and its a totally different world. Still learning a lot and have some questions that would be pretty nice to clear out when comparing firebase and ES. I will make sure to add some thoughts on your form. The handbook sound a very nice idea. In the meantime, considering my example, is there anyway I can also sort the aggregated buckets by `clicks`? I've tried using `bucket_sort` at the inner `aggs` inside `back_to_parent` but it looks that is not allowed. Let me know if you want me to open a new question to make things clear. – adolfosrs Feb 19 '21 at 17:19
  • Nice! I did use firebase in a project or two so I'll squeeze some from-firebase-to-ES tips in the handbook! @ bucket sort -- yes, a new question would be better. Feel free to tag me. – Joe - GMapsBook.com Feb 19 '21 at 19:07
  • Great to hear that! Most of my concerns are on the data schema design level of ES when comparing to firebase. Using arrays as a structure on firebase is something we want to try to avoid but due to some limitation regarding ES's data indexation you are kind of forced to use it as far as I could notice. I've just published a new question. Would love to hear from you there. And please let me know if my questions structure are fine or you think I should improve in some aspect. https://stackoverflow.com/questions/66294446/sorting-a-reverse-nested-back-to-parent-aggregation – adolfosrs Feb 20 '21 at 17:26
  • All good, the question is fine. OK I hear you regarding the schema design. Please put in into my [form](https://jozefsorocin.typeform.com/to/XeQRxdwV) too and be as specific as possible -- I mean, you're already using `nested` which is great for avoiding array flattening inconsistencies. What's your heaviest concern? How to handle updates when there are no cloud functions? – Joe - GMapsBook.com Feb 20 '21 at 18:05