Massive flaw in raft algorithm

Question

So the raft dissertation and paper say this is how to handle append entries:

Receiver implementation:

Reply false if term < currentTerm (§5.1)

Reply false if log doesn’t contain an entry at prevLogIndex whose term matches prevLogTerm (§5.3)

If an existing entry conflicts with a new one (same index but different terms), delete the existing entry and all that follow it (§5.3)

Append any new entries not already in the log

If leaderCommit > commitIndex, set commitIndex = min(leaderCommit, index of last new entry)

Leaders:

• Upon election: send initial empty AppendEntries RPCs (heartbeat) to each server; repeat during idle periods to prevent election timeouts (§5.2)

• If command received from client: append entry to local log, respond after entry applied to state machine (§5.3)

• If last log index ≥ nextIndex for a follower: send AppendEntries RPC with log entries starting at nextIndex

• If successful: update nextIndex and matchIndex for follower (§5.3)

• If AppendEntries fails because of log inconsistency: decrement nextIndex and retry (§5.3)

• If there exists an N such that N > commitIndex, a majority of matchIndex[i] ≥ N, and log[N].term == currentTerm: set commitIndex = N (§5.3, §5.4).

Rules for Servers Figure 2: A condensed summary of the Raft consensus algorithm (excluding membership changes and log co

However, this simply will never work in a situation where you are sending more than 1 entry at a time. If I send 3 entries, a single success will only increment by 1, lol. How do I make sense of this? How are we able to confirm that ALL of the entries were sent from nextIndex[serverId] -> to the end of the log?

What do you mean by "If I send 3 entries, a single success will only increment by 1" - what will be incremented by 1? — AndrewR, Mar 17 '23 at 01:28

score 0 · Answer 1 · answered Apr 06 '23 at 13:23

I'm assuming your question is around sending multiple AppendEntries when a peer is down or not able to respond, since a leader might accumulate multiple log entries from clients while a peer is down and each AppendEntries RPC will carry potentially different log entries (and values for prevLogIndex and prevLogTerm)

If yes, then this isn't really a flaw but an implementation detail on how you would like to handle multiple AppendEntries going out to the same peer in the cluster and potentially not receiving a response to each AppendEntry but perhaps just one.

Take a look at https://groups.google.com/g/raft-dev/c/PnvThDlMczU

Massive flaw in raft algorithm

1 Answers1