0

When I study raft, I have a problem.

A Raft cluster has 5 servers. we call them a,b,c,d,e. a is the leader. Now everything is ok. Then, A handle a client request, makes a log entry.

scenario 1, b & c replicate the log entry, d & e don't. Then a & b crash. c has the log entry, d & e not. the log is committed.

scenario 2, b replicate the log entry, c, d, e don't. Then a & e crash. b has the log entry, c & d not. the log is not committed.

How raft handle them?

Sujith Kumar
  • 872
  • 6
  • 19
tlb
  • 3
  • 3

1 Answers1

0

This statement "Then, A handle a client request, makes a log entry." should be extended to "Then, A handle a client request, A waits when at least 2 of (b,c,d,e) accept the request,[and then] makes a log entry."

Since there are five nodes - one leader and four followers - the majority requires three nodes: the leader and any two followers.

So, the leader adds entry to log when at least two followers accepted the request.

When a follower accepts a request, if does not mean the request is committed. A follower will commit a request only after the leader will tell it to do so.

Scenario 1: b & c replicate the log entry, d & e don't. Then a & b crash. c has the log entry, d & e not. the log is committed.

From the context, "replicate" means that an entry was committed (in terms of raft). When a&b crash a new election process will have to happen. As usual, majority is needed to elect a new leader, hence all three (c, d, e) will communicate with each other.

Raft guarantees that a node with most up to date log wins an election. In our set (c,d,e) C has the most up to date log. Hence, C will be elected as a new leader. On election, C will send out the record missing from D and E.

Scenario 2, b replicate the log entry, c, d, e don't. Then a & e crash. b has the log entry, c & d not. the log is not committed.

When B (and the leader) have the log entry, the record is not committed as no majority accepted the entry. On failure, new election will happen and B will win the election, same as in scenario 1.

Few notes

  • in raft, when an election is happening a node with most up to date log wins the election
  • a record may be in two states: proposed and committed. Committed state happens only after majority of nodes have the record. Even if a crash happens after commit, at least one of remaining nodes will have the record, hence that node will win a new election
  • it is interesting to consider, what a client of a raft cluster sees: if a client sends a request to a leader, and the leader fails before returning a reply - in this case the client does not know if the record was recorded or not - this is a very important property - not knowing the outcome. This uncertainty happens as the client does not know what exactly went wrong, did the cluster failed before or after committing the request
AndrewR
  • 1,252
  • 8
  • 7
  • Thanks, you answer light me up. I'll read the docs again. – tlb Apr 17 '23 at 15:08
  • You are welcome! When you read the doc, there is a parameter leaderCommit index. That is the confirmation for followers to commit a certain event as committed (hence permanent). While an event is in a log, but not yet committed, that event may be replaced with another one due to various failure scenarios. Raft guarantees that only committed events are immutable. – AndrewR Apr 17 '23 at 15:52
  • <1> This statement " Then, A handle a client request, A waits when at least 2 of (b,c,d,e) accept the request,[and then] makes a log entry." Should be “ Then, A receive a client request. “, then from the doc “ (The leader ) appends the command to its log as a new entry, then issues AppendEntries RPCs in parallel to each of the other servers to replicate the entry. “. <2> not use the term “replicate”, use simple word “save”. – tlb Apr 18 '23 at 06:59
  • <3> Yes, there has an important field “leaderCommit”. So Scenario1 could be splitted to 2 sub- Scenario. – tlb Apr 18 '23 at 06:59
  • <4> Scenario1a, b & c save the log entry, d & e don't. Then a & b crash. c has the log entry, d & e not. C has the log entry, and C knows the logentry committed. (because A has tell it by AppendEntries with parameter leaderCommit ). This is standard committed. The log entry will be reserverd and eventurally be executed. – tlb Apr 18 '23 at 07:01
  • <5> Scenario1b, b & c save the log entry, d & e don't. Then a & b crash. c has the log entry, d & e not. C has the log entry, but C don’t knows the logentry committed. (because A has crashed, not tell it by AppendEntries with parameter leaderCommit ). This is proposed or weak committed. C will be selected as new leader( because it has biggest log entry). The log entry will be reserverd and eventurally be executed. – tlb Apr 18 '23 at 07:01
  • <6> Scenario 2, b save the log entry, c, d, e don't. Then a & e crash. b has the log entry, c & d not. the log is not committed. The leader A doesn’t save the log entry on majority servers. It’s not committed. Then b will be selected as new leader. When b handle new request with new log entry, b will finally commit this log entry. – tlb Apr 18 '23 at 07:01
  • <7> My problem is actually on “Scenario 2“。 Because leader a crashs, and the log entry( only on 2 nodes, a&b) is not committed, it should be removed. That’s my expectation at first. But from the client view, the client hopes the command will be executed by best-efforts. So if at first it is not committed, but it could be committed finally, that’s ok . – tlb Apr 18 '23 at 07:02
  • <8> What people will see a cluster ? A, it’s stable, always available. B. If you send it an operation, it tell you ok. Then the correlated info meet the ok result. If you send it an operation, it tell you fails. Then the correlated info meet the undo result. If you send it an operation, it give you an exception. Then you need to check related info to know whether it really do ! – tlb Apr 18 '23 at 07:02
  • as an extra summary: log has entries, some of entries are immutable, the ones at and below commit index, others may change. – AndrewR Apr 18 '23 at 17:43