29

How can I use a filter in connection with an aggregate in elasticsearch?

The official documentation gives only trivial examples for filter and for aggregations and no formal description of the query dsl - compare it e.g. with postgres documentation.

Through trying out I found following query, which is accepted by elasticsearch (no parsing errors), but ignores the given filters:

{
  "filter": {
    "and": [
      {
        "term": {
          "_type": "logs"
        }
      },
      {
        "term": {
          "dc": "eu-west-12"
        }
      },
      {
        "term": {
          "status": "204"
        }
      },
      {
        "range": {
          "@timestamp": {
            "from": 1398169707,
            "to": 1400761707
          }
        }
      }
    ]
  },
  "size": 0,
  "aggs": {
    "time_histo": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1h"
      },
      "aggs": {
        "name": {
          "percentiles": {
            "field": "upstream_response_time",
            "percents": [
              98.0
            ]
          }
        }
      }
    }
  }
}

Some people suggest using query instead of filter. But the official documentation generally recommends the opposite for filtering on exact values. Another issue with query: while filters offer an and, query does not.

Can somebody point me to documentation, a blog or a book, which describe writing non-trivial queries: at least an aggregate plus multiple filters.

geekQ
  • 29,027
  • 11
  • 62
  • 58

4 Answers4

38

I ended up using a filter aggregation - not filtered query. So now I have 3 nested aggs elements.

I also use bool filter instead of and as recommended by @alex-brasetvik because of http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

My final implementation:

{
  "aggs": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "_type": "logs"
              }
            },
            {
              "term": {
                "dc": "eu-west-12"
              }
            },
            {
              "term": {
                "status": "204"
              }
            },
            {
              "range": {
                "@timestamp": {
                  "from": 1398176502000,
                  "to": 1400768502000
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "time_histo": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "1h"
          },
          "aggs": {
            "name": {
              "percentiles": {
                "field": "upstream_response_time",
                "percents": [
                  98.0
                ]
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}
Hearen
  • 7,420
  • 4
  • 53
  • 63
geekQ
  • 29,027
  • 11
  • 62
  • 58
  • 8
    you're possibly my favourite person right now. Have been battling with this for hours. – simonmorley Nov 03 '14 at 18:35
  • 4
    In this solution top aggr field is named "filtered", and that should not be mixed with http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html, so please use some other name (e.g. "aggresults") - under that name you will get results in response. Please check reference: http://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-filter-aggregation.html and answer http://stackoverflow.com/a/24823895/565525. – Robert Lujo Mar 24 '15 at 23:46
8

Put your filter in a filtered-query.

The top-level filter is for filtering search hits only, and not facets/aggregations. It was renamed to post_filter in 1.0 due to this quite common confusion.

Also, you might want to look into this post on why you often want to use bool and not and/or: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

Alex Brasetvik
  • 11,218
  • 2
  • 35
  • 36
4

more on @geekQ 's answer: to support filter string with space char,for multipal term search,use below:

{   "aggs": {
    "aggresults": {
      "filter": {
        "bool": {
          "must": [
            {
              "match_phrase": {
                "term_1": "some text with space 1"
              }
            },
            {
              "match_phrase": {
                "term_2": "some text with also space 2"
              }
            }
          ]
        }
      },
      "aggs" : {
            "all_term_3s" : {
                "terms" : {
                    "field":"term_3.keyword",
                    "size" : 10000,
                    "order" : {
                        "_term" : "asc" 
                    }
                }
           }
        }
    }   },   "size": 0 }
ItamarG3
  • 4,092
  • 6
  • 31
  • 44
lyn
  • 41
  • 2
3

Just for reference, as for the version 7.2, I tried with something as follows to achieve multiple filters for aggregation:

POST movies/_search?size=0
{
  "size": 0,
  "aggs": {
    "test": {
      "filter": {
        "bool": {
          "must": {
            "term": {
              "genre": "action"
            }
          },
          "filter": {
            "range": {
              "year": {
                "gte": 1800,
                "lte": 3000
              }
            }
          }
        }
      },
      "aggs": {
        "year_hist": {
          "histogram": {
            "field": "year",
            "interval": 50
          }
        }
      }
    }
  }
}
Hearen
  • 7,420
  • 4
  • 53
  • 63
  • 1
    Is there a way to enter 2 terms in filter? so inside must.term I want to add "term":{"genre":"action", "country":"USA"} something like this. – haneulkim Nov 06 '19 at 06:57
  • 1
    @makewhite it seems [`should`](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html) with proper `minimum_should_match` configured could help. – Hearen Nov 07 '19 at 12:02