1

I am trying to improve my knowledge in distributed databases and the various levels of consistency that could be achieved. First, let me define some terms I will use (please, tell me if I am wrong):

strong consistency: as reported by dr. Kleppmann in "Designing Data Intensive Applications", it is a synonym for "linearizability", a consistency level which makes a replicated data store to behave as if there were only a single copy of a data item and every operation on it takes place atomically.

replication factor: the number of copies of a data item.

Supposing I have a cluster made up of 3 nodes configured in strong consistency mode and a replication factor = 3 and the leader has successfully replicated a write to a data item X only to one of its followers, I have the following questions:

1. The write will be replicated to the second follower when it will return back online, isn't it?

In my opinion, the database can return "success" to the client application when the leader has successfully replicated the write at least to one of its followers. Indeed, it has reached a majority quorum for that write, so it has not to wait for the second follower to acknowledge the operation. However, given the value of the replication factor, writes will be applied in the same order (so, also that on X) on the second follower when it will be online.

2. If a client application tries to read X on the follower not yet updated, will it obtain a stale value?

As the database is working in strong consistency, it should not be possible to read a stale value. So, if the client cannot connect to the leader or the updated follower, it should get an error. This should be a consequence of the CAP Theorem.

Please, could anyone tell me if I am right and, if not, why? Thank you!

roberto
  • 13
  • 2

1 Answers1

1

The way you described the systems - having a leader and a success is when a majority of nodes accepted a write - this implies you are using single leader replication based on consensus. (please, ask if this requires explanation)

In a consensus based model, all nodes will be updated in the same order, but there is no guarantee when a specific node get updated. But it is guaranteed that if a write is accepted, then majority of nodes accepted it. The consensus protocol itself guarantees that all nodes will get all writes eventually.

I recommend to read a paper on Raft Algorithm - it's relatively straightforward and it covers all major aspects of consensus.

Two paragraphs above I said that for a given write two statements are true: majority of nodes accepted the write; and if some nodes are behind - they will eventually get updates. The question has does this eventuality works with strong consistency?

Consensus based system has two read modes: eventual consistent reads and strongly consistent read. I saw that many papers call the latter reads - linearizable reads.

Eventually consistent reads are simple - a reader goes to a random node, and they may or may not see the latest value. All good here.

Linearizable read is more complicated. To understand that, we should describe what the log is in a consensus based system. The log is the sequence of all events - every node eventually will have exactly same log - same events in the same order. So when we say - a write has been accepted - it means that majority of nodes appended that write-event into their logs.

Here is the algorithm to get a strongly consistent aka linearizable read:

  • a client has to connect to the leader
  • the client tells the leader that it requires a strongly consistent read
  • the leader proposes a new read-event to be appended to the log
  • this proposal is a standard operation in the consensus algorithm - the proposal is sent to all nodes
  • the leader waits till majority accepts this read-event; in your case as long as one followers accepts - the event is considered accepted
  • the leader appends the event to the log
  • as soon as the leader sees the read-event being appended to its log, the leader may conclude that it sees the latest possible picture of the world
  • the leader may now read its own log to find the data asked the client
  • the leader finds latest data and returns to the customer

The steps from above guarantee that the client sees the latest update.

AndrewR
  • 1,252
  • 8
  • 7
  • Thanks for your detailed explanation. Yes, I'm talking about single-leader replication. So, as I understand, in order to support strong consistency (linearizability) it is not a problem to have only 2 out of 3 nodes online (I can tolerate up to one node failure with this configuration). Related to my second question, Instead, if a client cannot connect to the leader, it can make no linearizable read and the system is unavailable to him, right? It cannot read a stale value because, otherwise, the read would not be linearizable. – roberto May 14 '22 at 08:35
  • yes, the read has to go through the leader and a follower. How I see it - when a client asks for a linearizable read - majority of nodes has to agree what is that value. And a leader is required to get that majority. And, as you said, you can't go to a followers as they are eventually consistent, which means stale values are possible – AndrewR May 14 '22 at 17:36
  • @AndrewR what if the node that went offline goes up and after some seconds goes again offline? Does the leader should cut it off from the partition? – DarkSkull Oct 23 '22 at 14:58
  • @DarkSkull leader usually talks to each node constantly; in raft case, one of reasons is to make sure no followers start election. Election is not a deal breaker, but it does slow down the system a bit. Basically, leader should keep trying to talk to a node, even if the node constantly is having issues. – AndrewR Oct 24 '22 at 02:04