1

This is a two-part question regarding nodetool repair and garbage collection.

Let's consider a replication factor of 3 for all tables, and suppose reads and writes require two confirmations of success to succeed. Based on my understanding of Cassandra, a successful write or delete would never be in danger of being missed as long as a read requires at least two responses, accepting only only the latest timestamp. This makes sense to me, but is it correct?

As a closely related question, if I configure Cassandra never to perform GC, but still perform nodetool repair periodically, will this suffice to garbage-collect old tombstones? Intuitively, a successfully repaired key range should not need to keep tombstones, so they could in theory be discarded when a repair is performed. Is this the case?

If my above two hypotheses are correct, it seems like we can achieve the following:

  1. Consistent reads and writes with no resurrected data (due to quorum reads and writes and avoiding GC completely)
  2. No unbounded growth in stale tombstones (due to periodically running nodetool repair, which hopefully performs GC if my above hypothesis is correct)
jonderry
  • 23,013
  • 32
  • 104
  • 171

1 Answers1

1

This post explains that quorum doesn't guarantee consistency: Read Operation in Cassandra at Consistency level of Quorum?

Assuming "GC" means compaction, I don't think nodetool repair will suffice to delete tombstones or take care of other compaction tasks. https://issues.apache.org/jira/browse/CASSANDRA-6602 describes a compaction-less scenario that sounds like what you're considering. If this is what you're doing, the recommended solution is to use DateTieredCompactionStrategy (DTCS) to store data written within a certain period of time in the same SSTable. DTCS was released in Cassandra 2.1.1 today and is described here: http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/tabProp.html?scroll=tabProp__moreCompaction

Community
  • 1
  • 1
catpaws
  • 2,263
  • 16
  • 18
  • 1
    It seems that post does suggest quorum is sufficient when used for both reads and writes (as per the title in this question). – jonderry Oct 24 '14 at 20:40
  • Yes, QUORUM give you strong consistency if you can tolerate some level of failure: http://www.datastax.com/documentation/cassandra/2.1/cassandra/dml/dml_config_consistency_c.html – catpaws Oct 24 '14 at 22:00