0

Elasticsearch doesn't support versioning, so I implemented it by myself using the approach #3 from this great answer: https://stackoverflow.com/a/8226684/4769188.

Now I want to retrieve all the versions of some type for the date range [from..to], and take only one most recent version of each document. How can I do this?

Community
  • 1
  • 1
Taras Kohut
  • 2,505
  • 3
  • 18
  • 42
  • If you have implemented #3 then your most recent versions Only will be in a separate index right? Why do you want to retrieve all the versions if you only care about the most recent one? Or do u mean get all versions which belong to a certain date range, and among those possibly old versions, select the most recent? – jay Sep 20 '16 at 18:01
  • @jay I mean get all versions which belong to a certain date range, and among those select the most recent ones. – Taras Kohut Sep 20 '16 at 18:17

1 Answers1

1

See if this helps...

I indexed the following documents:

    {
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test",
        "_id": "2",
        "_score": 1,
        "_source": {
          "doc_id": 123,
          "version": 2,
          "text": "Foo Bar",
          "date": "2011-09-01",
          "current": false
        }
      },
      {
        "_index": "test_index",
        "_type": "test",
        "_id": "4",
        "_score": 1,
        "_source": {
          "doc_id": 123,
          "version": 4,
          "text": "Foo Bar",
          "date": "2011-07-01",
          "current": false
        }
      },
      {
        "_index": "test_index",
        "_type": "test",
        "_id": "1",
        "_score": 1,
        "_source": {
          "doc_id": 123,
          "version": 1,
          "text": "Foo Bar",
          "date": "2011-10-01",
          "current": true
        }
      },
      {
        "_index": "test_index",
        "_type": "test",
        "_id": "3",
        "_score": 1,
        "_source": {
          "doc_id": 123,
          "version": 3,
          "text": "Foo Bar",
          "date": "2011-08-01",
          "current": false
        }
      }
    ]
  }}

Use the following query. This should return version 3 of the document. the "size" param inside "top_hits" determines how many docs per bucket you want. (right now its set to 1) .

{
    "size" : 0,
    "query" : {
        "filtered" : {
            "query" : {
                "match_all" : {}
            },
            "filter" : {
                "range" : {
                    "date" : {
                        "gte" : "2011-07-02",
                        "lte" : "2011-09-01"
                    }
                }
            }
        }
    },
    "aggs" : {
        "doc_id_groups" : {
            "terms" : {
                "field" : "doc_id",
                "size" : "10",
                "order" : {
                    "top_score" : "desc"
                }
            },
            "aggs" : {
                "top_score" : {
                    "max" : {
                        "script" : "_score"
                    }
                },
                "docs" : {
                    "top_hits" : {
                        "size" : 1,
                        "sort" : {
                            "version" : {
                                "order" : "desc"
                            }
                        },
                        "fields" : ["doc_id", "version", "date"]
                    }
                }
            }
        }
    }
}
}

Response:

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "doc_id_groups": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 123,
          "doc_count": 2,
          "docs": {
            "hits": {
              "total": 2,
              "max_score": null,
              "hits": [
                {
                  "_index": "test_index",
                  "_type": "test",
                  "_id": "3",
                  "_score": null,
                  "fields": {
                    "date": [
                      "2011-08-01"
                    ],
                    "doc_id": [
                      123
                    ],
                    "version": [
                      3
                    ]
                  },
                  "sort": [
                    3
                  ]
                }
              ]
            }
          },
          "top_score": {
            "value": 1
          }
        }
      ]
    }
  }
}
jay
  • 2,067
  • 2
  • 16
  • 31
  • thank you, it should work. but why do I need `"order" : { "top_score" : "desc" }` and `top_score` aggregation? I'm getting expected result even without them – Taras Kohut Sep 21 '16 at 13:25
  • You are right. That sort has nothing to do with getting the latest version. You can remove it. – jay Sep 21 '16 at 15:40