Terms aggregation based on child collection property

Question

I have the following document article

{
    title: "Some title",
    authors: [
      { LastName: "Smith", Country: "US"},
      { LastName: "Smith", Country: "UK"},
    ]
}

I want to add a terms aggregator to the search based on the property Country of the collection of authors. The search should return a list of articles and an aggregation bucket of all the distinct countries. It seems a reverse nested aggregation is the way to go, but I cant make it work.

The search aggregation output should be something like this:

"aggregations": {
  "countries": {
      "buckets": [{
        "key": "US",
        "doc_count": 1
      }, {
        "key": "UK",
        "doc_count": 1
      }]
    }
  }

Can you site an sample output too. – Vineeth Mohan Sep 25 '15 at 15:45 — Vineeth Mohan, Sep 25 '15 at 15:45

score 1 · Answer 1 · answered Sep 25 '15 at 19:11

I think you can get what you want with a terms aggregation inside a nested aggregation.

I set up a simple index like this:

PUT /test_index
{
   "mappings": {
      "doc": {
         "properties": {
            "authors": {
               "type": "nested",
               "properties": {
                  "Country": {
                     "type": "string",
                     "index": "not_analyzed"
                  },
                  "LastName": {
                     "type": "string",
                     "index": "not_analyzed"
                  }
               }
            },
            "title": {
               "type": "string"
            }
         }
      }
   }
}

Then added a couple of documents:

PUT /test_index/doc/1
{
    "title": "Some title",
    "authors": [
      { "LastName": "Smith", "Country": "US"},
      { "LastName": "Smith", "Country": "UK"}
    ]
}

PUT /test_index/doc/2
{
    "title": "another title",
    "authors": [
      { "LastName": "Jones", "Country": "SA"},
      { "LastName": "Jones", "Country": "UK"}
    ]
}

Then ran this query:

POST /test_index/_search?search_type=count
{
   "aggs": {
      "authors": {
         "nested": {
            "path": "authors"
         },
         "aggs": {
            "author_countries": {
               "terms": {
                  "field": "authors.Country"
               }
            }
         }
      }
   }
}

which seems to return what you wanted:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "authors": {
         "doc_count": 4,
         "author_countries": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
               {
                  "key": "UK",
                  "doc_count": 2
               },
               {
                  "key": "SA",
                  "doc_count": 1
               },
               {
                  "key": "US",
                  "doc_count": 1
               }
            ]
         }
      }
   }
}

Here is some code I used for testing:

http://sense.qbox.io/gist/ccf7bd9d05f646507b3316e985dd6a50e905aed3

Hi Sloan, that's pretty close to what I was looking for! But with one caveat I could not fix. If you have an article with two or more authors with the same country, lets say UK, the bucket for UK returns a doc_count that matches the number of authors from the uk. What I really need is the count of articles from the uk. One solution would be to apply a distinct filter to the country aggregation before the count. I couldnt find a way to do this. — Hernan, Sep 25 '15 at 19:54

Terms aggregation based on child collection property

1 Answers1