How does etcd propagate writes to non-leader members?

Question

From both the visualisation on the raft.github.io page and The Secret Lives of Data shows that write requests in Raft must be sent through the leader.

When I am running etcd, which uses Raft, I can send a etcdctl put request to any of the etcd members, even if they are not the leader, and the write would still be propagated across the cluster.

What is the mechanism behind this? Is it part of Raft? Is it specific to etcd or etcdctl?

score 2 · Answer 1 · answered May 29 '19 at 15:13

In the Raft paper, chapter 5.1.,

The leader handles all client requests (if a client contacts a follower, the follower redirects it to the leader).

They suggest that you contact any peer, and that peer either is the leader and does the work or it responds to the client with the information about the current leader, and then the client retries the request. This allows the client to know only one peer, and eventually the client finds out who the leader is.

Since all peers inside a cluster know about other peers, one can implement request redirection on the peer itself. This way, the client doesn't know about the redirection layer and can also only know for one peer.

Another option would be to broadcast the request to all peers, and checking that only one peer responds with success, (or none if an election is in progress and then you can retry the request). This would mean that the client needs to know about all the peers, and if you have a dynamic Raft cluster, you would need to keep track of the cluster configuration changes on the client also.

The mechanism behind this is specific to etcd and their implementation. From the docs available at github:

'MsgProp' proposes to append data to its log entries. This is a special type to redirect proposals to leader. Therefore, send method overwrites raftpb.Message's term with its HardState's term to avoid attaching its local term to 'MsgProp'. When 'MsgProp' is passed to the leader's 'Step' method, the leader first calls the 'appendEntry' method to append entries to its log, and then calls 'bcastAppend' method to send those entries to its peers. When passed to candidate, 'MsgProp' is dropped. When passed to follower, 'MsgProp' is stored in follower's mailbox(msgs) by the send method. It is stored with sender's ID and later forwarded to leader by rafthttp package.

How does etcd propagate writes to non-leader members?

1 Answers1