4

I am storing snapshots of data in ElasticSearch. I want to perform count metric aggregation on latest snapshot of each entry, the purpose is to know what state my current (latest) data are in

I have something like this

[
  {
    "id": 2,
    "state": "deleted",
    "timestamp": "2019-11-20T18:18:09+00:00"
  },
  {
    "id": 2,
    "state": "published",
    "timestamp": "2019-11-19T18:18:09+00:00"
  },
  {
    "id": 3,
    "state": "published",
    "timestamp": "2019-10-17T18:18:09+00:00"
  },
  {
    "id": 3,
    "state": "draft",
    "timestamp": "2019-10-16T18:18:09+00:00"
  }
]

I tried this

POST /snapshots/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "2": {
      "terms": {
        "field": "state.keyword",
      },
      "aggs": {
        "1": {
          "top_hits": {
            "size": 1,
              "sort": [
              {
                "timestamp": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

But the problem is it first create a bucket and in that bucket it does the sorting and calculate the top_hits so instead of

deleted = 1

published = 1

draft = 0

It returns

deleted = 1

published = 1

draft = 1

Community
  • 1
  • 1
Muhammad
  • 921
  • 2
  • 11
  • 29
  • What metric aggregation do you want to perform on which field? – Val Dec 05 '19 at 16:11
  • @Val I want to fetch `count` grouped by field `state` and only most recent document by field `timestamp` for each field `id` should be considered. Does it make sense? – Muhammad Dec 05 '19 at 23:27
  • So for each `id` you want to know the number of documents in each state + additionally the most recent document? – Val Dec 06 '19 at 05:18
  • @Val in the example I provided, you see there are two documents with `id = 2` right? I would check what is the most recent document with `id = 2` is in the above case it is the document with `timestamp = 2019-10-17T18:18:09+00:00`. I would count this document as 1 and put this in `state` *published*. So in above scenario the total count of published would be just 1 because only one recent document has `state = published` – Muhammad Dec 06 '19 at 09:33
  • Ok, so you can only ever have 1 as the number for each state.... Basically, you want to know in which state is each id. – Val Dec 06 '19 at 09:35
  • @Val yes main purpose is to know how many records are in all states – Muhammad Dec 06 '19 at 14:05

0 Answers0