0

We have a 5 node cassandra cluster. The cassandra version is 2.07. OS is Oracle Enterprise Linux 6.5.

The Java environment is:

-bash-4.1$ java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

The node repair is hanging randomly. The output log would show:

-------------- Repairing... ------------------------------------------------
[2014-05-05 20:00:02,305] Starting repair command #7, repairing 728 ranges for keyspace ???

And it just hangs w/o making any progress.

Any idea how to find the root cause of the problem?

Thanks!

Martin Serrano
  • 3,727
  • 1
  • 35
  • 48

1 Answers1

0

I am experiencing the same problems with cassandra 2.0.7. Usually it hands after it sends merkle tree requests to the replication partner nodes, and then fails to create its own snapshot to send that tree back to itself. So the log message would look like this:

INFO [RepairJobTask:1] 2014-06-10 18:56:42,176 RepairJob.java (line 134) [repair #3c663fb1-f0ce-11e3-ac99-f9b8874f4c5e] requesting merkle trees for <CF_Name> (to [/10.0.4.101, /10.0.2.91, /10.0.3.91, /10.0.3.111, /10.0.4.111, /10.0.4.92, /10.0.2.101, /10.0.3.101])

The only way to push the repair forward, is to restart cassandra on one of the nodes from that list (not the repairing node itself). This will throw a few errors, but at least the rest of the repair will continue.

Roman Tumaykin
  • 1,921
  • 11
  • 11