0

Will any issues arise if I deprioritize the Cassandra "nodetool repair" command using "nice" ? It causes high CPU "user time" load and is having a negative impact on our production systems, causing API timeouts on our Usergrid implementation. I see documentation on limiting network throughput, but iowait does not appear to be the issue. Additionally, are there any good methods for mitigating this problem?

1 Answers1

1

The nodetool command doesn't actually do any work. It just calls a JMX operation in C* to kick off the repair and then listens for updates to print out. Doing nice wont make any difference. There are a couple main phases to the repair

  1. build merkle trees (on each node)
  2. stream changes
  3. compactions

Possibly the validation compaction (on some versions can be controlled with compaction throttle) or the streams (can set stream throughput via nodetool or cassandra.yaml) are burning your CPU. If so can try using the throttles, but in some versions it wont make a difference.

After the repair is completed there are normal compactions that kick off for anti compaction in incremental repairs, and also for full repairs if theres a lot of differences streamed. Some problems are very version specific, so pay attention to logs and when CPU is high to drill down more.

Chris Lohfink
  • 16,150
  • 1
  • 29
  • 38