4

In the paper 《 In Search of an Understandable Consensus Algorithm 》, Figure 8 shows a problem in (d) and (e) that some old logs may be overwritten and never come back.

In section 5.4.2, it says “To eliminate problems like the one in Figure 8, Raft never commits log entries from previous terms by counting replicas. Only log entries from the leader’s current term are committed by counting replicas; once an entry from the current term has been committed in this way, then all prior entries are committed indirectly because of the Log Matching Property”.

I'm confused about that part, how does it works in Figure 8? What will happen and what will not?

tyChen
  • 1,404
  • 8
  • 27

2 Answers2

5

By adding the rule to figure 8.

Raft never commits log entries from previous terms by counting replicas.

So now we never commits log entries from previous terms, let see what will happen again at figure 8. I modified figure 8 to show the situation after apply the rule. enter image description here

(a) and (b) works the same.

Start from (c), log entry at index 2 is append at term 2 since step (a), where I draw a yellow circle. So it is from previous terms. Thus the leader will not replicate that entry (the yellow 2 with my black cross) according the rule. It must start replicate from entry at index 3.

This rule also mentioned at Figure 2 "Rules for Servers" leader's rule 3.1:

Send AppendEntries RPC with entries startting at nextIndex.

The nextIndex is initialized with last log index + 1, so it should start at log index 3 at (c), not index 2.

So for the hypothetical procedure at original (c), it is impossible to append log 2 to majority before log 3(the pink one appended at term 4) replicate at majority. and (d) will not happen.

UPDATE: 2020-12-04

@coderx and @OrlandoL have comments discussed about the (a), (b) and how S5 can't be a leader. Their discuss makes this answer more complete, So I put a reference here.

Basically, (a), (b) is not a must-happen condition, there are cases that S5 won't elected leader, such as S3,S4 have same chance to become leader. (please see the comments for detail)

These assumption is correct that S5 may not become a leader and the following procedure won't happen.

But let's go back to the paper Figure 8 and read the annotation of the figure:

A time sequence showing why a leader cannot determine commitment using log entries from older terms. In (a) S1 is leader and partially replicates the log entry at index 2. In (b) S1 crashes; S5 is elected leader for term 3 with votes from S3, S4, and itself, and accepts a different entry at log index 2.

IMO, the author is talking about the case that S5 is elected leader. Thus the whole procedure makes scene.

As @OrlandoL mentioned, In a MIT 6.824 Lab, you should consider all conditions to have a correct Raft implementation.

Hope this helps.

Li Jinyao
  • 898
  • 2
  • 12
  • 30
  • 1
    Also, in figure (b), how is the new leader elected with term 3? I understand with respect to the leader S1 in (a), the next term should be 3, but when S5 is getting the majority, it does not know about 3 since it got votes from S3, S4 & itself -> to all of them the current term is 1, so they would agree to term 2. Am I missing anything? – coderx Nov 28 '20 at 05:39
  • Also in the above figure (a) S1 could replicate data to S2 & crashed. The data remains in S2's persistent log file but not yet committed. Can S2 now get elected as the leader as it looks like it can gather votes from S3, S4 & itself? If S2 is elected as the new leader, it will replicate the entry to the followers. How will the client know about it, we already threw error when S1 crashed. Is it not some kind of inconsistency? – coderx Nov 28 '20 at 05:51
  • for your first comment in figure(b), in figure(a) When S1 becomes leader, at least S1 and other 2 node knows current term is 2, so at least 1 node in S3, S4, S5 knows current term is 2. When vote begins, according to "Rule to All Servers", nodes hold lower terms will automaticaly become follower and match the highest term in the cluster, so S3, S4, S5 will all konw current term is 2. – Li Jinyao Dec 04 '20 at 02:55
  • There is a possibility that S5 could not vote for S1 probably due to network partition, still S1 can get elected as the leader for term 2. In that case, S3 or S4 or both updates its term to 2 since it voted for S1. However S5 still does not know about the higher term. So for the next term 3, S5 can't get elected assuming it could not vote for S1. I think the paper assumes that S5 also voted for S1, only in that case S5 would know that current term is 2 & the process can go on as described in the image. – coderx Dec 10 '20 at 03:43
  • @coderx You are right. the image is talking about the last case. It's all based on some kind of special cases. – Li Jinyao Dec 11 '20 at 07:19
  • @LiJinyao Sorry this comment is late. I'm not sure about "it is impossible to append log 2 to majority before log 3(the pink one appended at term 4) replicate at majority". What if in (c) the AppendEntries succeed only on S3, s.t. S3 will have 124, S2 will have 12 and S1 will have 124. Does that implicitly make 3 copies of log 2? And still there are 2 copies of log 4 so S5 could still get majority of votes? – OrlandoL Dec 12 '20 at 14:10
  • @LiJinyao continue on my own comment. In my hypothetic case in (c), S1 issues AppendEntries to all nodes, only S3 succeeds. Now the state is S1=124, S2=12, S3=124, S4=1, S5=13. The problem becomes, how does S5 get elected as leader? When S1 is dead and S5 RequestVote, does it get votes from S2+S4+itself to become leader? – OrlandoL Dec 12 '20 at 15:10
  • I think the problem is in (a) and (b). According to Raft paper section 8, before a leader does anything, it needs to commit a NOOP to majority of nodes, to keep the nodes on the same page w.r.t to the previous terms and ensure the log continuity. We assume log 2 stands for the log entry(s) for term 2. Since it is not agreed by majority in (b), the master has not started doing anything real in term 2 AT ALL! Thus we can actually lose 2 in a later term. If in (b), S1 applied an NOOP to majority including itself, then in (c) S5 would not be able to get elected! – OrlandoL Dec 13 '20 at 12:27
  • Also according to "Rules for Servers", when S5 is requesting vote (S5=1 3) and see any reply has a higher term (Maybe S3 has 1 2 4 now), it will update its own term and change to FOLLOWER. Therefore S5 will NOT become a leader anyways. This, together with the previous comment, makes sure there's no issue. – OrlandoL Dec 13 '20 at 14:04
  • See a sample implementation (here)[https://github.com/ysn2233/MIT6.824/blob/e37c0da3eaee1e614afe9a0cc865526299178519/src/raft/raft.go#L234] for how a CANDIDATE switches to FOLLOWER if it sees any response with a higher term. – OrlandoL Dec 13 '20 at 14:07
  • @OrlandoL I've implement raft with MIT6.824 too, but this answer is based one the question and the paper Figure 8's annotation. It's talking about a case that (a) and (b) is already happened and S5 is elected as leader. Your assumption is correct, There are many way S5 can't be elected as leader, but it's another case. IMO, the paper is talking about the case that S5 already been elected as LEADER, and then discuss following procedures. So whether S5 will be elected is not discussed here. – Li Jinyao Dec 14 '20 at 03:40
  • @LiJinyao thanks for your reply! Indeed my followup question is not on what the paper originally was talking about or your answer was discussing. I guess that better belongs to a separate question. – OrlandoL Dec 31 '20 at 12:16
0

Raft doesn't commit entries from previous term because these entries of previous term could be overwritten by future leader just like leader S5 in (d).

Suppose leader S1 in (c) committed entry at index 2 of term 2, then that entry will be applied by S1, S2 and S3. Then S1 crashed, it's totally possible for S5 to become leader like in (d) 'cause its log is more up-to-date than S2, S3 and S4. S5 would overwrite entry of term 2 at index 2 with its own entry of term 3. This means leader S5 overwrites a committed entry! Some servers(S1, S2 and S3) have applied the entry of term 2, others(S4, S5) would apply entry of term 3 at index 2, which violates the State Machine Safety in figure 3.

So leader S1 of term 4 in (c) cannot commit entry of term 2 at index 2 unless it commit an entry of its own term like entry of term 4 at index 3 like in (d). Once entry at index 3 of term 4 is committed, entry at index2 of term 2 is auto-committed and they will never be overwritten by future leaders. (A candidate can become leader only if it has all committed entries from previous term.)

MrGuin
  • 37
  • 4