60

I've read the blog post on ES regarding versioning.

However, I'd like to be able to get the previous _source documents from an update.

For example, let's say I have this object:

{
    "name": "John",
    "age": 32,
    "job": "janitorial technician"
}
// this becomes version 1

And I update it to:

{
    "name": "John",
    "age": 32,
    "job": "president"
}
// this becomes version 2

Then, through versioning in ES, would I be able to get the previous job property of the object? I've tried this:

curl -XGET "localhost:9200/index/type/id?version=1"

but that just returns the most up-to-date _source object (the one where John is president).

I'd actually like to implement a version differences aspect much like StackOverflow does. (BTW, I'm using elastic-search as my main db - if there's a way to do this with other NoSQL databases, I'd be happy to try it out. Preferably, one that integrates well with ES.)

Amnon Shochot
  • 8,998
  • 4
  • 24
  • 30
swatkins
  • 13,530
  • 4
  • 46
  • 78
  • Do you found any solution? I decided to choose option 1 that DrTech suggested, but have search problem on that, and some one else suggested me to use the second option, but have problem on making that array for index with laravel elasticquent. – jones Jan 10 '16 at 05:54
  • @jones It's been a while since I worked on this project, but I implemented DrTech's #3 solution from below. It worked flawlessly for me. Each time you update an object, save the old version first in a different index. Then, you can just query based on whatever your unique identifier is. – swatkins Jan 11 '16 at 14:54

1 Answers1

79

No, you can't do this using the builtin versioning. All that does is to store the current version number to prevent you applying updates out of order.

If you wanted to keep multiple versions available, then you'd have to implement that yourself. Depending on how many versions you are likely to want to store, you could take three approaches:

For low volume changes:

1) store older versions within the same document

{ text: "foo bar",
  date:  "2011-11-01",
  previous: [
      { date: '2011-10-01', content: { text: 'Foo Bar' }},
      { date: '2011-09-01', content: { text: 'Foo-bar!' }},
  ]
}

For high volume changes:

2) add a current flag:

{
   doc_id:  123,
   version: 3,
   text:    "foo bar",
   date:    "2011-11-01",
   current: true
}

{
   doc_id:  123,
   version: 2,
   text:    "Foo Bar",
   date:    "2011-10-01",
   current: false
}

3) Same as (2) above, but store the old versions in a separate index, so keeping your "live" index, which will be used for the majority of your queries, small and more performant.

DrTech
  • 17,031
  • 5
  • 54
  • 48
  • 9
    Thank you for taking the time to answer this. I've actually found that out from some more reading. I had found solution #2 on some other sites and was going to go with that. But I think your solution 3 is brilliant. Keep the main index clean and clutter-free, but still be able to easily access the previous versions. Great idea! Thanks! – swatkins Nov 22 '11 at 14:31
  • @swatkins Could you please link to the other sites that were dealing with solution 2? – Konrad Reiche Jul 22 '13 at 21:28
  • 2
    Depending on your use cases but you might need to add a "timestamp" field to keep track of the date of the last update. With that you can retrieve only object updated after a particular date. – Ronan Quillevere Jan 08 '16 at 17:34
  • 1
    You proposed good solutions, I have chosen first option, but have problem in searching, how to search inside previous? – jones Jan 09 '16 at 11:40
  • @jones you are looking for a nested mapping - look here for examples: https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html – datashaman Jul 19 '16 at 17:01