I have a Cassandra node with 64G ram (16GB heap, G1GC), and occasionally this happens:
ERROR [Thrift:74] 2017-06-11 13:20:25,710 CassandraDaemon.java:229 - Exception in thread Thread[Thrift:74,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236) ~[na:1.8.0_45]
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118) ~[na:1.8.0_45]
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) ~[na:1.8.0_45]
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) ~[na:1.8.0_45]
at org.apache.thrift.transport.TFramedTransport.write(TFramedTransport.java:146) ~[libthrift-0.9.2.jar:0.9.2]
at org.apache.thrift.protocol.TBinaryProtocol.writeBinary(TBinaryProtocol.java:211) ~[libthrift-0.9.2.jar:0.9.2]
at org.apache.cassandra.thrift.Column$ColumnStandardScheme.write(Column.java:678) ~[apache-cassandra-thrift-2.1.13.jar:2.1.13]
at org.apache.cassandra.thrift.Column$ColumnStandardScheme.write(Column.java:611) ~[apache-cassandra-thrift-2.1.13.jar:2.1.13]
at org.apache.cassandra.thrift.Column.write(Column.java:538) ~[apache-cassandra-thrift-2.1.13.jar:2.1.13]
at org.apache.cassandra.thrift.ColumnOrSuperColumn$ColumnOrSuperColumnStandardScheme.write(ColumnOrSuperColumn.java:673) ~[apache-cassandra-thrift-2.1.13.jar:2.1.13]
at org.apache.cassandra.thrift.ColumnOrSuperColumn$ColumnOrSuperColumnStandardScheme.write(ColumnOrSuperColumn.java:607) ~[apache-cassandra-thrift-2.1.13.jar:2.1.13]
at org.apache.cassandra.thrift.ColumnOrSuperColumn.write(ColumnOrSuperColumn.java:517) ~[apache-cassandra-thrift-2.1.13.jar:2.1.13]
at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.write(Cassandra.java:14729) ~[apache-cassandra-thrift-2.1.13.jar:2.1.13]
at org.apache.cassandra.thrift.Cassandra$multiget_slice_result$multiget_slice_resultStandardScheme.write(Cassandra.java:14633) ~[apache-cassandra-thrift-2.1.13.jar:2.1.13]
at org.apache.cassandra.thrift.Cassandra$multiget_slice_result.write(Cassandra.java:14563) ~[apache-cassandra-thrift-2.1.13.jar:2.1.13]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) ~[libthrift-0.9.2.jar:0.9.2]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[libthrift-0.9.2.jar:0.9.2]
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205) ~[apache-cassandra-2.1.13.jar:2.1.13]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45]
Now it seems like an obvious message to increase my heap, but I am reluctant to do so without knowing why.
It looks like it was trying to do a write, and ran out of space on the heap. Here's a view of jconsole of the node when it crashed:
We're using kairosdb and that's showing about 5k writes/s for this node (kairsdb.datastore.write_size)
In the meantime I've set CASSANDRA_HEAPDUMP_DIR somewhere it's allowed to write so I can take a look further.
Some config vars:
- key_cache_size_in_mb: 256
- row_cache_size_in_mb: 0
- concurrent_reads: 128
- concurrent_writes: 64
- concurrent_counter_writes: 64
- memtable_allocation_type: offheap_objects
- concurrent_compactors: 1
- compaction_throughput_mb_per_sec: 48
Any ideas/advice/pointers?
Thanks!
EDIT: Another node died, with this heap output, pointing at JMX?? screenshot