9

Say, I have following documents:

1st doc:

{
  productName: "product1",
  tags: [
    {
      "name":"key1",
      "value":"value1"
    },
    {
      "name":"key2",
      "value":"value2"
    }
  ]
}

2nd doc:

{
  productName: "product2",
  tags: [
    {
      "name":"key1",
      "value":"value1"
    },
    {
      "name":"key2",
      "value":"value3"
    }
  ]
}

I know if I want to group by productName, I could use a terms aggregation

"terms": {
    "field": "productName"
}

which will give me two buckets with two different keys "product1", "product2".

However, what should the query be if I would like to group by tag key? i.e. I would like to group by tag with name==key1, then I am expecting one bucket with key="value1"; while if I group by tag with name==key2, I am expecting the result to be two buckets with keys "value2", "value3".

What should the query look like if I would like to group by the 'value' inside a nested array but not group by the 'key'? Any suggestion?

pickypg
  • 22,034
  • 5
  • 72
  • 84
Christina.Y
  • 93
  • 1
  • 3

1 Answers1

12

It sounds like a nested terms aggregation is what you're looking for.

With the two documents you posted, this query:

POST /test_index/_search
{
   "size": 0,
   "aggs": {
      "product_name_terms": {
         "terms": {
            "field": "product_name"
         }
      },
      "nested_tags": {
         "nested": {
            "path": "tags"
         },
         "aggs": {
            "tags_name_terms": {
               "terms": {
                  "field": "tags.name"
               }
            },
            "tags_value_terms": {
               "terms": {
                  "field": "tags.value"
               }
            }
         }
      }
   }
}

returns this:

{
   "took": 67,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "product_name_terms": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": []
      },
      "nested_tags": {
         "doc_count": 4,
         "tags_name_terms": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
               {
                  "key": "key1",
                  "doc_count": 2
               },
               {
                  "key": "key2",
                  "doc_count": 2
               }
            ]
         },
         "tags_value_terms": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
               {
                  "key": "value1",
                  "doc_count": 2
               },
               {
                  "key": "value2",
                  "doc_count": 1
               },
               {
                  "key": "value3",
                  "doc_count": 1
               }
            ]
         }
      }
   }
}

Here is some code I used to test it:

http://sense.qbox.io/gist/a9a172f41dbd520d5e61063a9686055681110522

EDIT: Filter by Nested Value

As per your comment, if you want to filter the nested results by a value (of the nested results), you can add another "layer" of aggregation making use of the filter aggregation as follows:

POST /test_index/_search
{
   "size": 0,
   "aggs": {
      "nested_tags": {
         "nested": {
            "path": "tags"
         },
         "aggs": {
            "filter_tag_name": {
               "filter": {
                  "term": {
                     "tags.name": "key1"
                  }
               },
               "aggs": {
                  "tags_name_terms": {
                     "terms": {
                        "field": "tags.name"
                     }
                  },
                  "tags_value_terms": {
                     "terms": {
                        "field": "tags.value"
                     }
                  }
               }
            }
         }
      }
   }
}

which returns:

{
   "took": 10,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "nested_tags": {
         "doc_count": 4,
         "filter_tag_name": {
            "doc_count": 2,
            "tags_name_terms": {
               "doc_count_error_upper_bound": 0,
               "sum_other_doc_count": 0,
               "buckets": [
                  {
                     "key": "key1",
                     "doc_count": 2
                  }
               ]
            },
            "tags_value_terms": {
               "doc_count_error_upper_bound": 0,
               "sum_other_doc_count": 0,
               "buckets": [
                  {
                     "key": "value1",
                     "doc_count": 2
                  }
               ]
            }
         }
      }
   }
}

Here's the updated code:

http://sense.qbox.io/gist/507c3aabf36b8f6ed8bb076c8c1b8552097c5458

Sloan Ahrens
  • 8,588
  • 2
  • 29
  • 31
  • Thanks but what if I only want to group by the value only associate with one tag key? So I'm expecting some way that could allow me to group by tags.value that only associate with tags.name==key2 without returning any bucket associate with key1's values. Is there such way? – Christina.Y Jan 15 '16 at 22:18