0

I am having 6 cassandra nodes with around 6GB data on each except one which is having around 45GB of data.

This is because this node keeping hints which is around 39GB.

My all nodes are up and fully functional I dint get why this node keeping huge hints.

This node getting killed with exception

NFO [ScheduledTasks:1] 2015-08-18 03:07:28,941 StatusLogger.java (line 55) Pool Name                    Active   Pending      Completed   Blocked  All Time Blocked
 INFO [ScheduledTasks:1] 2015-08-18 03:08:43,100 StatusLogger.java (line 70) ReadStage                         2         2       24803657         0                 0
 INFO [ScheduledTasks:1] 2015-08-18 03:08:43,101 StatusLogger.java (line 70) RequestResponseStage              0         0       49288185         0                 0
 INFO [ScheduledTasks:1] 2015-08-18 03:08:43,101 StatusLogger.java (line 70) ReadRepairStage                   0         0        2607139         0                 0
 INFO [ScheduledTasks:1] 2015-08-18 03:08:43,102 StatusLogger.java (line 70) MutationStage                     2        12       15871730         0                 0
 INFO [ScheduledTasks:1] 2015-08-18 03:08:43,103 StatusLogger.java (line 70) ReplicateOnWriteStage             0         0              0         0                 0
 INFO [ScheduledTasks:1] 2015-08-18 03:08:43,103 StatusLogger.java (line 70) GossipStage                       0         0        3318395         0                 0
 INFO [ScheduledTasks:1] 2015-08-18 03:08:43,104 StatusLogger.java (line 70) AntiEntropyStage                  0         0              0         0                 0
ERROR [ReadStage:398] 2015-08-18 03:08:43,104 CassandraDaemon.java (line 187) Exception in thread Thread[ReadStage:398,5,main]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:344)
        at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
        at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
        at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:118)
        at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
        at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
        at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88)
        at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
        at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82)
        at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
        at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
        at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:144)
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:123)
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:97)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:185)
        at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
        at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
        at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:101)
        at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:75)
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
 INFO [ScheduledTasks:1] 2015-08-18 03:08:43,105 StatusLogger.java (line 70) MigrationStage                    0         0             32         0                 0
 INFO [ScheduledTasks:1] 2015-08-18 03:08:43,154 StatusLogger.java (line 70) MemtablePostFlusher               0         0          32691         0                 0
 INFO [ScheduledTasks:1] 2015-08-18 03:08:43,154 StatusLogger.java (line 70) MemoryMeter                       0         0            371         0                 0
 INFO [

Edit: nodetool -h 172.xxx.xxx.x23 tpstats

s =  -ea -javaagent:./../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8G -Xmx8G -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                         0         0       24426641         0                 0
RequestResponseStage              0         0       48496365         0                 0
MutationStage                     0         0       15623599         0                 0
ReadRepairStage                   0         0        2562071         0                 0
ReplicateOnWriteStage             0         0              0         0                 0
GossipStage                       0         0        3268659         0                 0
AntiEntropyStage                  0         0              0         0                 0
MigrationStage                    0         0             32         0                 0
MemoryMeter                       0         0            371         0                 0
MemtablePostFlusher               0         0          32263         0                 0
FlushWriter                       0         0          18447         0              1080
MiscStage                         0         0              0         0                 0
PendingRangeCalculator            0         0              8         0                 0
commitlog_archiver                0         0              0         0                 0
InternalResponseStage             0         0             12         0                 0
HintedHandoff                     2         2           1194         0                 0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                  0
PAGED_RANGE                  0
BINARY                       0
READ                         0
MUTATION                     0
_TRACE                       0
REQUEST_RESPONSE             0

EDIT

endpoint_snitch: GossipingPropertyFileSnitch.

CREATE KEYSPACE fbkeyspace_r2 WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1': '2',
  'DC24': '2'
};

./nodetool netstats

Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 235114
Mismatch (Blocking): 0
Mismatch (Background): 20146
Pool Name                    Active   Pending      Completed
Commands                        n/a         0        4452633
Responses                       n/a         0        2603124

./nodetool -h 172.xxx.xxx.80 netstats

Mode: NORMAL
    Not sending any streams.
    Read Repair Statistics:
    Attempted: 30581744
    Mismatch (Blocking): 0
    Mismatch (Background): 2973864
    Pool Name                    Active   Pending      Completed
    Commands                        n/a         0      525946068
    Responses                       n/a         0      526266474
Aftab
  • 938
  • 1
  • 9
  • 20
  • What is your snitch and topography ? You should check if this node does not contain a non existing node in his list of known node. It might try to contact a node that doesn't exist anymore – sam Aug 19 '15 at 08:19
  • Also could you give us your cfhistograms for this node ? You should be able to see what table it is keeping hints for in the hitn table – sam Aug 19 '15 at 08:21
  • The netstats for this node comapred with the netstats of the others would also be interesting it would tell us if this node is often the coordinator – sam Aug 19 '15 at 08:30
  • Hi sam, I have updated the question. How to get the columnfamily name for which hints are getting stored? – Aftab Aug 19 '15 at 09:01
  • You have to look inside the system.hint table http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_about_hh_c.html?scroll=concept_ds_ifg_jqx_zj__how-hinted-handoff-works , in this table you should see the tables for which th hints are saved and more important you should see the ip of the target. Is it an IP that you expected ? – sam Aug 19 '15 at 09:03
  • Among the two netstats you gave, which one is the one of the node shutting down ? – sam Aug 19 '15 at 09:05
  • First one with ./nodetool netstats. trying to access system.hints table using cqlsh but cant its too big. – Aftab Aug 19 '15 at 09:14
  • I will wager that the tpstats you gave us is not taken before the node shutting down but after restarting ? – sam Aug 19 '15 at 09:17
  • yes, and now no hints are running. – Aftab Aug 19 '15 at 09:18
  • Does this problem happened only once ? Or is happening again ? Is the size of hinted handoff growing even now ? – sam Aug 19 '15 at 09:20
  • No its not growing. but node got killed today also. – Aftab Aug 19 '15 at 09:22
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/87355/discussion-between-sam-and-aftab). – sam Aug 19 '15 at 09:24

0 Answers0