Log cleaning would remove entries which does not contribute to current state, then how to construct AppendEntries for those removed entries if they are needed for slow followers or new member? Need to modify AppendEntries so that it could contain non-consecutive entries? or use snapshot instead?
1 Answers
Copycat implements a form of incremental compaction using an algorithm that's somewhat similar to the log cleaning algorithm described in the Raft dissertation. So, there is some precedent and code for how to do this. Copycat's incremental compaction algorithm differs from the one described in the Raft literature in that it retains the positions of entries in the log after compaction to take advantage of sequential reads rather than copying entries to the head of the log, but AppendEntries
RPCs sent to followers can still be sent with gaps in the batch of entries.
We handle the missing entries simply by including the index with each entry in an AppendRequest
batch. But this also requires some mechanism in the log to skip entries on followers. If a follower is receiving entries from a compacted segment of the leader's log, the follower must duplicate the structure of the leader's log by skipping entries as they're written to the log.
There are some other challenges with incremental compaction in Raft that are not extensively described in the Raft literature, particularly with respect to handling tombstones. One of the issues with tombstones is they can't be removed from the log until they've been applied on all servers. If a tombstone is committed and is removed from the leader's log before it's replicated to a follower (that may be partitioned), that follower may never delete its state. In Copycat, this necessitated adding a globalIndex
to track the highest index stored on all servers.
But I digress and I realize I over-answered your question. Incremental compaction in Raft is an interesting and challenging problem. If you're interested in reading more about how it was solved in Copycat, I've written extensive documentation on the Copycat's incremental compaction algorithm, including in-depth descriptions of various issues with handling of tombstones and approaches for implementing snapshots on top of an incremental compaction algorithm.
If you learn one thing from Copycat's documentation, it will likely be that there's a lot of complexity in incremental compaction in Raft. It took us many months to work out all the algorithms therein, but perhaps the lessons we learned can be of use to you.
For sure, implementing snapshots in Raft is significantly easier than incremental compaction algorithms like the one in Copycat. But there are still some complexities to it. For example, Java is not well suited to forking a process to prevent blocking during snapshots, and that was one of the reasons we chose to write an incremental compaction algorithm for Raft. Implementing support for large snapshots in Copycat would require either copying the full state machine state in memory or adding a leader transfer mechanism to ensure leaders are not blocked while snapshotting large state machines. Weigh the options against the reality of your environment.

- 7,785
- 1
- 26
- 21
-
1Thank you very much! In fact, I'm reading the copycat document, which indeed supplies many practical details not covered by the raft paper or thesis. – kingluo May 16 '16 at 05:33
-
Awesome! You can also chat or email me directly if you have any questions. I'm more than happy to discuss it further off SO. – kuujo May 16 '16 at 16:43