3

I am new to ES7 and trying to understand optimistic concurrency control.

I think I understand that when I get-request a document and send its _seq_no and _primary_term values in a later write-request to the same document, if the values differ, the write will be completely ignored.

But what happens to the document in the default case where I don't send the _seq_no and _primary_term values? Will the write go through even if it has older _seq_no and _primary_term values (therefore making the index inconsistent), or only be processed if the values are newer?

If the former, will the document eventually be consistent?

I'm trying to figure out if I need to send these values to get eventual consistency or if I get it for free without sending those values.

Amit
  • 30,756
  • 6
  • 57
  • 88
Morrowless
  • 6,856
  • 11
  • 51
  • 81
  • this answer might help you, https://stackoverflow.com/questions/56725207/elasticsearch-optimistic-locking/56729119#56729119 – Amit Mar 06 '20 at 13:22
  • I've read that before, and it helped me understand the concept but doesn't answer the question of, what happens if I don't send the _seq_no and _primary_term values. – Morrowless Mar 06 '20 at 13:30
  • Sure, give me some time, would explain it in sometime :) – Amit Mar 06 '20 at 14:53
  • As promised posted my detailed answer, please go through it and let me know if have further questions :-) – Amit Mar 06 '20 at 20:21

1 Answers1

6

It's a great distributed system question. Let me break down the problem into sub-parts for readability and even before explain what is _seq_no and _primary_term as there isn't much explanation of those on the ES site.

  1. _seq_no is the incremental counter which is assigned to ES document for each operation(update, delete, index), for example:- the first time you index a doc, it will have value 1, next update will have 2, next delete operation will have three and so on. Read operation doesn't update it.
  2. _primary_term is the also an incremental counter, but change only when a replica shard is promoted as primary, due to network or any other failure, so if everything is excellent in your cluster it will not be changed, but in case of some failure and other replica promoted to primary then it would be increased.

Coming to the first question,

Q:- What happens to the document in the default case where I don't send the _seq_no and _primary_term values?

Ans:- you can have lost update issue, suppose you have a counter which you are updating, simultaneously 2 requests read the counter value to 1 and trying to increment by 1. now when you don't specify these above terms explicitly, then it's calculated by ES. Now both the requests reach simultaneously to ES, then ES(primary shard) will process them one by one by increasing the sequence number, so at the end, your counter will have value 2, instead of 3. to make sure this doesn't happen, you pass these term values explicitly, and when ES tries to update them will see different sequence number and will reject your request. To prevent such lost updates, use-cases, its always recommended sending explicit version number.

Q:- I'm trying to figure out if I need to send these values to get eventual consistency or if I get it for free without sending those values..

Answer:- These are related to concurrency control and nothing to deal with eventual consistency. In ES, write always happens to primary shards, but read can happen to any replicas(may contain obsolete data), which makes ES eventual consistent.

Important read

https://www.elastic.co/blog/elasticsearch-sequence-ids-6-0

Amit
  • 30,756
  • 6
  • 57
  • 88
  • 1
    Thanks for the detailed answer. I now realize I've been immensely confused about concurrency and eventual consistency. I'm using ES as a secondary db and indexing data is based off of my primary db which always has consistent data by handling requests in a transaction, so I don't need to worry about sending seq and term values. – Morrowless Mar 07 '20 at 02:19
  • @Morrowless, yeah ES is mainly used in such scenarios and only if you are using for counter update, like scenario where you couldn't afford the lost update, you should send both params, glad finally you understood the crux of it and I was helpful, it took quite some time for me to write a concise answer, which finally paid off :-) – Amit Mar 07 '20 at 02:32