Questions tagged [consensus]

Consensus is the problem of reaching agreement among members of a group. Talking in terms of computing it is the agreement on a certain value that is needed for during computation by nodes that participate in a cluster. Reaching consensus in a distributed environment is a challenging task and under certain conditions not even possible. Consensus algorithms are often used for replicating a state machine as a general approach for enhancing fault-tolerance

In the context of distributed computing consensus is a fundamental problem that has received intense attention in research as problem like atomic compare-and-swap (CAS) registers or atomic transaction commit can be reduced to it. Furthermore consensus builds a foundation to realise applications like leader election, clock synchronisation and state machine replication. Although the problem can be explained in simple words, solving it is far more subtle and under certain circumstances not even possible.

In general consensus is the problem of reaching agreement among members of a group. Talking in terms of computing, it is the agreement on a certain value that is needed for during computation by nodes that participate in a cluster. This may not sound complicated but indeed gets as in a distributed system different types of unpredictable partial failures like:

  • loss of packets
  • duplication/reordering of packets
  • arbitrary delayed packet delivery
  • pause or even crash of a node

may happen and will happen, which impede solving the problem. Therefore a consensus algorithm needs to satisfy certain properties:

  • Agreement: All decisions by non faulty processors must be the same.
  • Integrity: No node decides twice.
  • Validity: If all nodes decide on value v, then v was proposed.
  • Termination: Every non-faulty node needs to eventually decide on a value.

Termination is a liveness property and requires that the algorithm is fault-tolerant. Without satisfying this property an algorithm could simply use a predefined leader that decides on values (like a 2PC). But if the leader fails the algorithm would not make progress anymore - hence the termination requirement. Agreement and integrity are safety properties and build the core of the consensus algorithm. Validity covers the trivial solution where nodes would always decide on the same pre given value and thus also reach consensus by definition.

Whether a solution, that is an actual algorithm, exist for the consensus problem, depends on the system model. Therefore one has to differentiate between a synchronous or asynchronous system model1. Since mostly, in order to send messages, we have to rely on a shared communication channel, at least the communication is asynchronous and thus our real world applications fit that model better. Unfortunately in such kind of environment there is an impossibility proof by Fischer et al. (1), known as the FLP Impossibility, that shows in a fully asynchronous model even if just one node is faulty (not even considering byzantine failures), no algorithm exist that always reaches consensus. By proving that in every possible algorithm an execution exist that would never terminate, they have shown that no algorithm would always reach the point of agreement.

Nevertheless real solutions to this problem exist. Note that the impossibility result applies on fully asynchronous environments. Although our real world applications fit that model better, it does not mean all conditions are accurate (e.g processors do maintain an internal clock). Making less stringent assumptions about the model allows circumventing the result of the FLP Impossibility. Setting for example an upper bound for message delivery as an unreliable failure detection can turn the communication channel into a partial synchronous one. A system model in which the consensus problem is solvable. (2)

Algorithms that solve the consensus problem are: Paxos, Raft, Viewstamped Replication, Zab


1 In synchronous models there are known bounds for message delivery and process execution. In an asynchronous model processors not even maintain an internal clock and hence no bounds exist.

303 questions
8
votes
1 answer

Leader address/location in Raft

This may be a very simple question but I've not been able to find a good answer to this yet. Maybe someone can help me. Once a leader is elected - The clients will send all requests ONLY to the leader. Is this correct? Given that the location…
Soumya Simanta
  • 11,523
  • 24
  • 106
  • 161
7
votes
1 answer

How do nodes in a Raft cluster know what is the "majority"?

I am reading the the Raft paper and following the secret life of data visualisation and it seems that the majority is crucial in Raft, both for leader election as well as append entry requests. My question is how do the nodes know the total number…
spygi
  • 412
  • 3
  • 10
7
votes
1 answer

Difference between atomic broadcast and consensus

Consensus is about all the machines coming to an agreement over a value. Atomic broadcast also says that a process emitting a msg should either be agreed by all or none So what is the difference?
ffff
  • 2,853
  • 1
  • 25
  • 44
7
votes
2 answers

Why is it legit to use no-op to fill gaps between paxos events?

I am learning Paxos algorithm (http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf) and there is one point I do not understand. We know that events follow a timely order, and it happens when, say, events 1-5 and 10 are…
OneZero
  • 11,556
  • 15
  • 55
  • 92
6
votes
2 answers

How does raft preserve safty when a leader commits a log entry and crashes before informing followers this commitment?

In my understanding, a leader sends AppendEntries RPC to the followers, and if majority of followers return success, the leader will commit this entry. It will commit this entry by applying it to its own state machine, and it will also return to the…
user534498
  • 3,926
  • 5
  • 27
  • 52
6
votes
3 answers

Consensus of Hyperledger Fabric

I'm new with Hyperledger Fabric. I'm reading with the document of Fabric latest version, but I'm not clear with consensus of Fabric. What is the consensus that Fabric used? And how does it work? Please explain.
Jony
  • 101
  • 5
6
votes
1 answer

Kafka leader election in multi-dc with an arbiter/witness/observer

I would like to deploy a Kafka cluster in two datacenters with the same number of nodes on each DC. The first DC is used in active mode while the second is in passive mode. For example, let say that both datacenters have 3 nodes with 2 in-sync…
Nicolas Henneaux
  • 11,507
  • 11
  • 57
  • 82
6
votes
1 answer

In Corda, what data is sent to a non-validating notary service?

This question frequently comes up in conversations: When a Corda transaction is sent to a non-validating notary service for finalisation, what can the notary service see and deduce about the world?
Antony
  • 73
  • 5
6
votes
3 answers

How does a consensus algorithm guarantee consistency?

How does a consensus algorithm like Paxos "guarantee safety (freedom from inconsistency)" when two generals prove the "impossibility of designing algorithms to safely agree"? When I consider the simplest case of getting two servers to either (1)…
5
votes
3 answers

How do replicas coming back online in PAXOS or RAFT catch up?

In consensus algorithms like for example PAXOS and RAFT, a value is proposed, and if a quorum agrees, it's written durably to the data store. What happens to the participants that were unavailable at the time of the quorum? How do they eventually…
Markus Jevring
  • 832
  • 1
  • 11
  • 17
5
votes
2 answers

In Raft when does a follower know an entry became committed? Can an out-of-date node can win a election?

In raft if a log replicated to majority, it is considered as committed in leader. Then leader send msg to follower to tell follower an entry become commit.If not, how and when follower know an entry become committed??? Another question,if an out of…
wang
  • 53
  • 5
5
votes
3 answers

“Gossip about gossip” protocols

Lately there is much rumor about the (patented) hashgraph consensus algorithm, which claims to have very good complexity measures. See the whitepaper: https://swirlds.com/downloads/SWIRLDS-TR-2016-01.pdf A central part of this appoach, is the so…
5
votes
1 answer

Is the Raft consensus algorithm a byzantine fault-tolerant (bft) algorithm?

Is the raft consensus algorithm a byzantine fault tolerant algorithm? How many (percentage of) nodes are required to reach agreement/consensus?
Nathan Aw
  • 545
  • 5
  • 18
5
votes
2 answers

What is the proper behaviour for a Paxos agent in this scenario?

I'm looking into Paxos and I'm confused about how the algorithm should behave in this contrived example. I hope the diagram below explains the scenario. A few points: Each agent acts as a proposer/acceptor/learner Prepare messages have form…
Allen George
  • 450
  • 7
  • 15
5
votes
1 answer

Two-phase commit: availability, scalability and performance issues

I have read a number of articles and got confused. Opinion 1: 2PC is very efficient, a minimal number of messages are exchanged and latency is low. Source: http://highscalability.com/paper-consensus-protocols-two-phase-commit Opinion 2: It is very…
1
2
3
20 21