I came up with one question when reading the Raft paper. The scenario is followed. There are 3 newly started Raft instances, R1, R2, R3. R1 is elected as leader, with nextIndex {1, 1, 1}, matchIndex {0, 0, 0} and term 1. Now it receives command 1 from the client and the logs of the instances are as follow:
R1: [index 0, command 0], [index 1, command 1]
R2: [index 0, command 0]
R3: [index 0, command 0]
What if the network is not reliable? If R1 is sending this log to R2 but the AppendEntries RPC times out, the leader R1 has to resend the [index 1, command 1] again. Then it may receive replies{term: 1, success: true} twice.
The paper says:
If last log index ≥ nextIndex for a follower: send AppendEntries RPC with log entries starting at nextIndex
• If successful: update nextIndex and matchIndex for follower (§5.3)
• If AppendEntries fails because of log inconsistency: decrement nextIndex and retry (§5.3)
So the leader R1 will increse nextIndex and matchIndex twice: nextIndex {1, 3, 1}, matchIndex {0, 2, 0}, which is not correct. When the leader sends the next AppendEntries RPC, i.e., a heartbeat or log replication, it can fix the nextIndex, but the matchIndex will never have a chance to be fixed.
My solution is to add a sequence number to both AppendEntries arguments and results for every single RPC calls. However, I was wondering if there is a way to solve this problem only with the arguments given by the paper, that is, without the sequence number.
Any advice will be appreciated and thank you in advance.