0

I have Cassandra Datastax 2.2.3 cluster (only one node) and as a test I'm adding a new node. After successfully adding the new node and starting it with bootstrap=false, I'm trying to rebalace it with nodetool repair.

However, this error pops up in logs of the old node:

ERROR [SharedPool-Worker-142] 2015-10-30 14:02:41,993 JVMStabilityInspector.java:117 - JVM state determined to be unstable.  Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[na:1.7.0_80]
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) ~[na:1.7.0_80]
    at org.apache.cassandra.utils.memory.SlabAllocator.getRegion(SlabAllocator.java:137) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.memory.SlabAllocator.allocate(SlabAllocator.java:97) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.memory.ContextAllocator.allocate(ContextAllocator.java:57) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.memory.ContextAllocator.clone(ContextAllocator.java:47) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.memory.MemtableBufferAllocator.clone(MemtableBufferAllocator.java:61) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.db.Memtable.put(Memtable.java:209) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1244) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:406) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:366) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.db.Mutation.apply(Mutation.java:214) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:50) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_80]
    at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-2.2.3.jar:2.2.3]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]

and this:

ERROR [SharedPool-Worker-126] 2015-10-30 14:02:04,049 SEPWorker.java:141 - Failed to execute task, unexpected exception killed worker: {}
java.lang.IllegalStateException: Shutdown in progress
    at java.lang.ApplicationShutdownHooks.remove(ApplicationShutdownHooks.java:82) ~[na:1.7.0_80]
    at java.lang.Runtime.removeShutdownHook(Runtime.java:239) ~[na:1.7.0_80]
    at org.apache.cassandra.service.StorageService.removeShutdownHook(StorageService.java:728) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.JVMStabilityInspector$Killer.killCurrentJVM(JVMStabilityInspector.java:119) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.JVMStabilityInspector$Killer.killCurrentJVM(JVMStabilityInspector.java:109) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.utils.JVMStabilityInspector.inspectThrowable(JVMStabilityInspector.java:68) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:168) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) ~[apache-cassandra-2.2.3.jar:2.2.3]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]

and the repair fails:

Repair session 24406790-8014-11e5-bf74-a1fb6926eba3 for range (8334390792461377170,8383846774169681811] failed with error Endpoint /x.x.x.x died

I've tried running nodetool repair -seq - result is the same.

Questions?

  • How much memory does nodetool repair need? How to check it?
  • How can I rebalance the ring now? Is there any way to trigger repair step by step?
  • If not, can I add "virtual" RAM (maybe as swap), increase heap and trigger the repair?
piotrwest
  • 2,098
  • 23
  • 35

1 Answers1

1

Running repair doesn't rebalance the ring. What you want is to run nodetool rebuild on the new node to stream data to it.

rs_atl
  • 8,935
  • 1
  • 23
  • 28
  • Thank you for your answer! Do you know maybe "How much memory does nodetool repair need? How to check it?"? Just curious. – piotrwest Nov 02 '15 at 14:22
  • It depends on how much data needs to be streamed during the repair. – rs_atl Nov 03 '15 at 13:38
  • This answer is wrong. You want to run repair. rebuild is for adding a new datacenter. I know this is an old thread, but it still pops up in search hits. This other SO thread explains it much better: https://stackoverflow.com/a/20201289 – diq Jun 20 '21 at 20:59