0

I want to calculate per IP access count of each product in one day.

There are three parameters in one index(nginx-access-log):

  • timestamp
  • clientip
  • product_id

I know date_histogram can refer to https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html .

And count can refer to https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#_precision_control.

But I have no idea how to combine the aggs to construct the script.


Update:

I use below script to search

GET log-nginx_access*/_search 
{
  "aggs": {
    "by_day": {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1d",
        "time_zone": "Asia/Shanghai",
        "min_doc_count": 1
      },
      "aggs": {
        "by_product": {
          "terms": {
            "field": "uri_args.product_id",
            "size": 100
          }
        },
        "aggs": {
          "by_ip": {
            "terms": {
              "field": "clientip"
            }
          }
        }
      }
    }
  }
}

got error:

{
  "error": {
    "root_cause": [
      {
        "type": "unknown_named_object_exception",
        "reason": "Unknown BaseAggregationBuilder [by_ip]",
        "line": 18,
        "col": 20
      }
    ],
    "type": "unknown_named_object_exception",
    "reason": "Unknown BaseAggregationBuilder [by_ip]",
    "line": 18,
    "col": 20
  },
  "status": 400
}
Mithril
  • 12,947
  • 18
  • 102
  • 153

1 Answers1

1

Perhaps we can use terms and date_histogram aggregations

Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html

GET /{index_name}
{  
  "aggs": {
    "by_day": {
      "date_histogram": {
        "field" : "timestamp",
        "interval" : "day"
      },
      "aggs": {
        "by_product": {
          "terms" : {
            "field" : "product",
            "size": 100 // 100 unique products will be aggregated
          },
          "aggs": {
            "by_ip": {
              "terms" : {
                "field" : "ip"
              }
            }
          }
        }            
      }
    }
  }
}

Response of terms aggregation has doc_count field which may satisfy your requirement. One thing we have to take into consideration is size parameter to define how unique the aggregation is.

deerawan
  • 8,002
  • 5
  • 42
  • 51
  • @Mithril interesting. Did you execute it in Kibana and got that error? – deerawan Jul 10 '18 at 05:39
  • Yes, I used Kibana - dev_tools to test the script. – Mithril Jul 10 '18 at 05:52
  • @Mithril I mistakenly put the `by_ip` aggs in wrong place. Please try again with updated answer. – deerawan Jul 10 '18 at 06:22
  • Thank you ! I understood that elasticsearch aggs `timestamp` first, then aggs `product` in each interval - `day` , finally aggs `ip` by each product in every day. That's the way to construct the query. – Mithril Jul 10 '18 at 06:47