1

With this mapping:

PUT pizzas
{
  "mappings": {
    "pizza": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "types": {
          "type": "nested",
          "properties": {
            "topping": {
              "type": "keyword"
            },
            "base": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

And this data:

PUT pizzas/pizza/1
{
  "name": "meat",
  "types": [
    {
      "topping": "bacon",
      "base": "normal"
    },
    {
      "topping": "pepperoni",
      "base": "normal"
    }
  ]
}

PUT pizzas/pizza/2
{
  "name": "veg",
  "types": [
    {
      "topping": "broccoli",
      "base": "normal"
    }
  ]
}

If I run this nested aggregation query:

GET pizzas/_search
{
  "size": 0,
  "aggs": {
    "types_agg": {
      "nested": {
        "path": "types"
      },
      "aggs": {
        "base_agg": {
          "terms": {
            "field": "types.base"
          }
        }
      }
    }
  }
}

I get this result:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "types_agg": {
      "doc_count": 3,
      "base_agg": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "normal",
            "doc_count": 3
          }
        ]
      }
    }
  }
}

I expected my aggregation to return a doc_count of 2 because there are only two documents which match my query. However it is clear that because it's an inverted index, it is finding 3 results and therefore 3 documents.

Is there anyway to get it to return unique document counts?

(tested in Elasticsearch 5.4.3)

harvzor
  • 2,832
  • 1
  • 22
  • 40
  • 1
    like this is how i understand this. In nested aggregation if do nested you this will return the results in context of nested type. Like it leaves the aggregator reduce there. So you will to push a reverse_nested to come back to root or so on. use https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-reverse-nested-aggregation.html – user3775217 Jul 21 '17 at 14:37

1 Answers1

2

Just discovered the answer shortly after asking the question.

Changing the aggregation query to be:

GET pizzas/_search
{
  "size": 0,
  "aggs": {
    "types_agg": {
      "nested": {
        "path": "types"
      },
      "aggs": {
        "base_agg": {
          "terms": {
            "field": "types.base"
          },
          "aggs": {
            "top_reverse_nested": {
              "reverse_nested": {}
            }
          }
        }
      }
    }
  }
}

Yields the result:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "types_agg": {
      "doc_count": 3,
      "base_agg": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "normal",
            "doc_count": 3,
            "top_reverse_nested": {
              "doc_count": 2
            }
          }
        ]
      }
    }
  }
}

The important part which was added to the query was:

"aggs": {
    "top_reverse_nested": {
        "reverse_nested": {}
    }
}

Reverse nested join back to the root of the document so it only gets unique aggregations.

You can read about reverse_nested here.

harvzor
  • 2,832
  • 1
  • 22
  • 40