"Filter then Aggregation" or just "Filter Aggregation"?

Question

I am working on ES recently and I found that I could achieve the almost same result but I have no clear idea as to the DIFFERENCE between these two.

"Filter then Aggregation"

POST kibana_sample_data_flights/_search
{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "DestCountry": "CA"
        }
      }
    }
  },
  "aggs": {
    "ca_weathers": {
      "terms": { "field": "DestWeather" }
    }
  }
}

"Filter Aggregation"

POST kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "ca": {
      "filter": {
        "term": {
          "DestCountry": "CA"
        }
      },
      "aggs": {
        "_weathers": {
           "terms": { "field": "DestWeather" } 
        }
      }
    }
  }
}

My Questions

Why there are two similar functions? I believe I am wrong about it but what's the difference then? _{(please do ignore the result format, it's not the question I am asking ;p)}
Which is better if I want to filter out the unrelated/unmatched and start the aggregation on lots of documents?

We remove taglines and salutations from *all* posts, see the [expected behaviour](https://stackoverflow.com/help/behavior) rules in our help center. I’ve removed the line again, do not put it back please. The post has been locked temporarily to ensure you understand how important this is to us. — Martijn Pieters, Aug 29 '19 at 01:05

score 7 · Answer 1 · answered Aug 27 '19 at 03:49

7

When you use it in "query", you're creating a context on ALL the docs in your index. In this case, it acts like a normal filter like: SELECT * FROM index WHERE (my_filter_condition1 AND my_filter_condition2 OR my_filter_condition3...).

When you use it in "aggs", you're creating a context on ALL the docs that might have (or haven't) been previously filtered. Let's say that if you have an structure like:

#OPTION A
{
    "aggs":{
        t_shirts" : {
            "filter" : { "term": { "type": "t-shirt" } }
        }
    }
}

Without a "query", is exactly the same as having

#OPTION B
{
    "query":{
        "filter" : { "term": { "type": "t-shirt" } }
    }
}

BUT the results will be returned in different fields.

In the Option A, the results will be returned in the aggregations field.

In the Option B, the results will be returned in the hits field.

I would recommend to apply your filters always on the query part, so you can work with subsecuent aggregations of the already filtered docs. Also because Aggrgegations cost more performance than queries.

Hope this is helpful! :D

answered Aug 27 '19 at 03:49

Kevin Quinzel

1,430
1
13
23

1

Thanks for reaching out, I am now more curious about the performance influence as my questions stated. As for the formats, it won't matter much in my question. Thanks for the help ;) Would you please elaborate more in performance aspect instead? I will really appreciated it ;p – Hearen Aug 27 '19 at 03:54
4

In option A, the aggregation will be run on ALL documents. In option B, the documents are first filtered and the aggregation will be run only on the selected documents. Say you have 10M documents and the filter select only a 100, it's pretty evident that option B will always be faster. – Val Aug 27 '19 at 06:06
1

@Val Thanks, Val. Actually I thought it should be something as you explained but not so sure. And I was just thinking of you helping me out. Huh! You really came to help, clearing my confusion again! Really appreciate it ;) Thank you! – Hearen Aug 27 '19 at 08:06
This should be the accepted answer. The docs also make this clear here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html#change-agg-scope The query first limits the scope of the documents on which aggregation is performed. – Iguananaut Mar 08 '22 at 13:20

score 2 · Answer 2 · answered Aug 27 '19 at 07:34

Both filters, used in isolation, are equivalent. If you load no results (hits), then there is no difference. But you can combine listing and aggregations. You can query or filter your docs for listing, and calculate aggregations on bucket further limited by the aggs filter. Like this:

POST kibana_sample_data_flights/_search
{
  "size": 100,
  "query": {
    "bool": {
      "filter": {
        "term": {
          ... some other filter
        }
      }
    }
  },
  "aggs": {
    "ca_filter": {
      "term": {
         "TestCountry": "CA"
      }
    },
    "aggs": {
      "ca_weathers": {
        "terms": { "field": "DestWeather" }
      }
    }
  }
}

But more likely you will need the other way, ie. make aggregations on all docs, to display summary informations, while you display docs from specific query. In this case you need to combine aggragations with post_filter.

score 0 · Accepted Answer · answered Oct 18 '19 at 08:21

Answer from @Val's comment, I may just quote here for reference:

In option A, the aggregation will be run on ALL documents. In option B, the documents are first filtered and the aggregation will be run only on the selected documents. Say you have 10M documents and the filter select only a 100, it's pretty evident that option B will always be faster.

"Filter then Aggregation" or just "Filter Aggregation"?

My Questions

3 Answers3

Linked