3

I read the following on the front page in etcd:

etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. It gracefully handles leader elections during network partitions and can tolerate machine failure, even in the leader node.

What do they mean by "strongly" consistent? How does their consistency model relate to perhaps more formal or established definitions of sequential and linear consistency?

Josh
  • 11,979
  • 17
  • 60
  • 96

2 Answers2

3

"Consistency" described in Distributed system is different than consistency described in "ACID transaction".

Consistency described in Paxos, Raft, Zookeeper etc. (etcd is based on Raft) closely resembles (D) Durability in ACID terminology. What they meant is that transaction is committed once data is written to disk in majority of nodes. Which further implies if majority of nodes up, then reads will get latest data written.[1]

Let's say, etcd has 5 node cluster, then transaction will be said as committed only when 3 nodes (Majority of nodes) commit the transaction.

what you are referring from sequential & linear consistency is (I) Isolation property in ACID.

etcd. have modes in which it can have "linear consistency" if it reads and writes data every time from majority of nodes. "Sequential consistency" with high performance in distributed systems is tough beast although database like H-Store have solved it already back in 2007.

[1] https://raft.github.io/raft.pdf

WebServer
  • 1,316
  • 8
  • 12
0

The answer to this is covered in detail in Guarantees Provided section of etcd documentation.

etcd offers linearizable reads/writes for single-key operations (like get/set, but not watch). This means if you write to a key, the requests have to go through the Raft consensus protocol and writes have to be committed to the majority of the member nodes to succeed. This ensures a "total order" in which the operations execute on an etcd database.

Similarly, if you read values by connecting to a particular member, you'll read values in the order they are written (a subsequent read return read an older value on the same node). If you connect to the leader and read the value, you'll always get the latest value.

From Thoughtwork's Consistent Core article:

Linearizability is the strongest consistency guarantee where all the clients are guaranteed to see latest committed updates to data. Providing linearizability along with fault tolerance needs consensus algorithms like Raft, Zab or Paxos to be implemented on the servers.

ahmet alp balkan
  • 42,679
  • 38
  • 138
  • 214