0

I want to perform a query equivalent to the following MYSQL query

SELECT http_user, http_req_method, dst dst_port count(*) as total
FROM my_table
WHERE http_req_method='GET' OR http_req_method="POST"
GROUP BY http_user, http_req_method, dst dst_port

I built the following query:

{
    "query":{       
        "bool":{

            "should":[
                {
                    "term":{"http_req_method":"GET"}
                },
                {
                    "term":{"http_req_method":"POST"}
                }
            ],

        }
    },

    "aggs":{           
        suser":{
            "terms":{
                "field":"http_user"
            },
            "aggs":{
                "dst":{
                    "terms":{
                        "field":"dst"
                    },
                    "aggs":{
                        "dst_port":{
                            "terms":{
                                "field":"dst_port"
                            },
                            "aggs":{
                                "http_req_method":{
                                    "terms":{
                                        "field":"http_req_method"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

( I might be missing some branches there but it's correct in my code). The problem is that results also include other methods too like CONNECT, although I only ask for GET or POST. I thought aggregations are applied on the results after the query. Am I doing something wrong here?

Apostolos
  • 7,763
  • 17
  • 80
  • 150

2 Answers2

1

I would leverage "minimum_should_match", like this:

"query":{       
    "bool":{
        "minimum_should_match": 1,
        "should":[
            {
                "term":{"http_req_method":"GET"}
            },
            {
                "term":{"http_req_method":"POST"}
            }
        ],

    }
},

Another way that works better would be to leverage the terms query in a bool/filter clause instead

"query":{       
    "bool":{
        "filter":[
            {
                "terms": {"http_req_method": ["GET", "POST"] }
            }
        ]
    }
},
Val
  • 207,596
  • 13
  • 358
  • 360
0

According to the latest Elasticsearch documentation, you should move the filter part inside the aggregation. Something like this:

{
   "aggs":{           
        get_post_requests":{
            "filter" : {
                "bool": [
                    { "term":{"http_req_method":"GET"} },
                    { "term":{"http_req_method":"POST"} },
                ]
            },
            "aggs": {
                "suser"{
                    "terms":{
                        "field":"http_user"
                    }
                },
                "aggs":{
                    "dst":{
                        "terms":{
                            "field":"dst"
                        },
                        "aggs":{
                            "dst_port":{
                                "terms":{
                                    "field":"dst_port"
                                },
                                "aggs":{
                                    "http_req_method":{
                                        "terms":{
                                            "field":"http_req_method"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Hope the parentheses are ok. Let me know if this gets you closer to the result :)

Mihai Ionescu
  • 968
  • 1
  • 8
  • 13
  • Elasticsearch documentations says the following:An aggregation can be seen as a unit-of-work that builds analytic information over a set of documents. The context of the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed query/filters of the search request). By this I understand that aggs should be applied on the filtered query – Apostolos Oct 14 '16 at 19:12