2

I would like to update an ElasticSearch Document while maintaining the document's version the same. I'm using version_type=external as indicated in the versioning section of the index_ documentation. Updating a document with another of the same version is normally prevented as indicated in that section: "If the value provided is less than or equal to the stored document’s version number, a version conflict will occur and the index operation will fail."

The reason I want to keep the version unaltered is because I do not create a new version of my object (stored in my database) when one adds new tags to that object, but I would like the new tags to show up in my ElasticSearch index. Is this possible with ElasticSearch?

I tried deleting the document and then adding a new document with the same Id and Version but that still gives me the following exception:

VersionConflictEngineException[[myindex][2] [mytype][6]: version conflict, current 1, provided 1]

Just for reference, I'm using PHP Elastica (with methods $type->deleteDocument($doc); and $type->addDocument($doc);) but this question should apply to ElasticSearch in general.

RayOnAir
  • 2,038
  • 2
  • 22
  • 33

1 Answers1

2

The time for which elasticsearch keeps information about deleted documents is controlled by index.gc_deletes parameter. By default this time is 1m. So, theoretically, you can decrease this time to 0s, wait for a second, delete the document, index a new document with the same version, and set index.gc_deletes back to 1m. But at the moment that would work only on master due to a bug. If you are using older version of elasticsearch, you will not be able to change index.gc_deletes without closing the index first.

There is a good blog post on elasticsearch.org web site that describes how versions are handled by elasticsearch in details.

Kevin Panko
  • 8,356
  • 19
  • 50
  • 61
imotov
  • 28,277
  • 3
  • 90
  • 82
  • Thanks for sharing this solution. The problem I see with it is that it would change the index.gc_deletes configuration for the entire index raising the risks for which it was set to 1 minute by default (docs that are meant to stay deleted, may be reinserted if a prior version is submitted after deletion). Question: Why is it that the ElasticSearch versioning system does not allow updating a doc with another of the same version? Wouldn't it be sufficient not to allow updates to docs of older versions? – RayOnAir Jul 27 '13 at 22:45
  • 2
    It's not feasible because Elasticsearch is using version internally to keep track of updates. So, version has to increase with each update. What you could do is multiple your external version by some number (let's say 10000) before passing it to Elasticsearch as a version. So, external version 25 would become 250000. Then when you need to update document without changing external version, you just omit version on url, and internal version will get incremented to 250001, on the next external update, it will become 260000 and so on. Version is represented as java long, so there is some room. – imotov Jul 27 '13 at 23:19
  • Thanks, that could be an option. Although, I'm also wondering about: saving the external version as a field of my document, so that one adds a new document to the index only if the version of the current data field is lower than or equal to the new doc's. In that scenario, the document version type would be internal all the time. The problem with this is that between the time I check the version field and the time I add the new doc another doc with the same id might have been indexed. Is there a way to check a field and add the doc within a sort of transaction? – RayOnAir Jul 29 '13 at 15:38
  • Yes, you can update atomically using internal versions and this is exactly what [update](http://www.elasticsearch.org/guide/reference/api/update/) operation is doing. – imotov Jul 29 '13 at 15:47
  • Yeap! I somehow did not think about that... I will go for that solution then. To recap: Before any update, I get the current indexed doc, compare its data "ext-version" field with the new doc's, if the new doc's is higher or equal, I add the new doc, setting the its version equal to the current doc version (the elastic search internal version). If no other version was added to the index meanwhile, the add document would go through, otherwise it will fail. Then, one could retry the whole process on conflict. – RayOnAir Jul 29 '13 at 23:07