This is a two-part question regarding nodetool repair and garbage collection.
Let's consider a replication factor of 3 for all tables, and suppose reads and writes require two confirmations of success to succeed. Based on my understanding of Cassandra, a successful write or delete would never be in danger of being missed as long as a read requires at least two responses, accepting only only the latest timestamp. This makes sense to me, but is it correct?
As a closely related question, if I configure Cassandra never to perform GC, but still perform nodetool repair periodically, will this suffice to garbage-collect old tombstones? Intuitively, a successfully repaired key range should not need to keep tombstones, so they could in theory be discarded when a repair is performed. Is this the case?
If my above two hypotheses are correct, it seems like we can achieve the following:
- Consistent reads and writes with no resurrected data (due to quorum reads and writes and avoiding GC completely)
- No unbounded growth in stale tombstones (due to periodically running nodetool repair, which hopefully performs GC if my above hypothesis is correct)