1

Our production version of DSE is 4.8.4(Cassandra 2.1.12). We run 3 nodes cluster with 256 vnodes per node, ~200GB data per node, RF=3. We are going to consistently migrate to the latest DSE version 5.1.1(Cassandra 3.10.0).

According to DataStax upgrade manual http://docs.datastax.com/en/upgrade/doc/upgrade/datastax_enterprise/upgdDSE50.html repair should be done before starting the upgrade. We don't use incremental repairs and to repair the entire cluster we ran full sequential repair on a single node. After 12 hours of running 100/768 token ranges are repaired, but cpu usage pretty high and number of sstables for one of our tables increases almost linearly. We have several issues with this table during normal operation as well and one of the upgrade reason is to replace existing DTCS with new TWCS compaction strategy.

We are concerned about long repair time duration and increasing resources utilization. So we want to know whether repair is 100% necessary before upgrade? What are consequences of not doing/doing it? If we are going to upgrade several versions consistently should we perform read repair after each upgrade?

Mikita Harbacheuski
  • 2,193
  • 8
  • 16

2 Answers2

1

So, you don't do regular repairs at all? It's highly recommended.

About repair before the upgrade: from what i know it's just a precaution, as upgrade process itself will not modify your data until you upgrade sstables in the end.

If you use QUORUM consistency level you should not be affected much by inconsistencies between nodes which will be eventually repaired by Read Repair.

So i think it's safe, but i think you should ask Datastax just to be sure.

Igor Novgorodov
  • 266
  • 1
  • 5
1

Running of read repair before any node maintenance is required to prevent data loss. It's possible if the maintaining node exclusively owns some portion of data and it was totally broken during the maintenance.

Mikita Harbacheuski
  • 2,193
  • 8
  • 16