0

I have a 5 node apache cassandra 2.0.6 cluster with 48 GB ram and 2 TB data directory and commit log directory with 93 GB capacity. The JVM heap space for cassandra is 8 GB. I use JVisualVM Mbeans plugin for monitoring the cassandra metrics. The hints are getting created continuously in all nodes even though all the nodes are up. And as hints are getting created while writing data, after sometime i am facing tombstone overwhelming exception which is aborting the queries. Could anyone please explain why is it happening and provide remedy for the same.

venkat sam
  • 158
  • 1
  • 12
  • Do you see something suspicious in your logs? – Mikhail Stepura Aug 26 '14 at 17:44
  • Yes. My logs are filled with tombstones overwhelming exceptions every 10 minutes. – venkat sam Aug 28 '14 at 06:11
  • ERROR [HintedHandoff:1308] 2014-08-28 06:34:33,727 CassandraDaemon.java (line 196) Exception in thread Thread[HintedHandoff:1308,1,main] ERROR [HintedHandoff:1309] 2014-08-28 06:44:33,077 SliceQueryFilter.java (line 200) Scanned over 200000 tombstones in system.hints; query aborted (see tombstone_fail_threshold) ERROR [HintedHandoff:1309] 2014-08-28 06:44:33,078 CassandraDaemon.java (line 196) Exception in thread Thread[HintedHandoff:1309,1,main] – venkat sam Aug 28 '14 at 06:12
  • Well, that's clear. Do you see something related to "node is down"? – Mikhail Stepura Aug 28 '14 at 06:22
  • Right now I am not getting node down error. But a week back I got the error stating that "Gossiper is down and Native thrift is down". My older logs got purged and so I couldn't provide the exact log statement. – venkat sam Aug 28 '14 at 10:49

1 Answers1

1

The issue with hints tombstone overwhelming exception is known and there are Jiras to improve the situation.

Are you getting the tombstone ERROR or tombstone WARN in your logs? If you are hitting the tombstone ERROR then you will want to temporarily increase the threshold to avoid the error and allow your hints to process.

If your cluster continues to generate hints regularly under normal operations then it is clearly overwhelmed in some fashion and that issue needs to be addressed so that hints are not required for normal operations. The most likely cause is long GC pauses. Do you see "GC for" messages in your system logs? If so, how long on average in ms are the pauses and how frequent? How many are ParNew vs ConcurrentMarkSweep?

dkblinux98
  • 386
  • 2
  • 6
  • Thanks for the reply. I am getting Tombstone errors not warning. I tried increasing tombstone_failure_threshold from 100000 to 200000 but as you said it temporarily fixed the error but in few hours error started to re-appear. – venkat sam Aug 30 '14 at 08:10
  • Regarding GC Pauses: on an average ParNew is taking 300 ms and it is running every one minute.ConcurrentMarkSweep takes 250 ms and it is running randomly on an average it is running at every 10 min. Right now we have write_request_timeout_in_ms: 2000 and we are planning to increase it to 10000 ms (assuming that hints got generated due to write failure, since in my cluster write happens at the rate of 170 GB per day). Can you please explain how does GC causes hints triggers hints and also Is my current ParNew and ConcurrentMarkSweep parameters values are okay, if not how to tune it? – venkat sam Aug 30 '14 at 08:11
  • For your reference I am providing the system log below – venkat sam Aug 30 '14 at 08:12
  • INFO [ScheduledTasks:1] 2014-08-27 08:56:51,354 GCInspector.java (line 116) GC for ParNew: 304 ms for 1 collections, 5079302472 used; max is 8422162432 INFO [ScheduledTasks:1] 2014-08-27 08:57:20,763 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 203 ms for 1 collections, 5366351976 used; max is 8422162432 INFO [ScheduledTasks:1] 2014-08-27 08:58:13,529 GCInspector.java (line 116) GC for ParNew: 231 ms for 1 collections, 2170074400 used; max is 8422162432 – venkat sam Aug 30 '14 at 08:12
  • Long ParNew and CMS pauses will cause the node to appear down to the coordinator and thus to store hints. So improving GC will also improve the hints and overall write performance as well. You probably need to increase HEAP_NEWSIZE in cassandra-env.sh. Try setting it to 1024M to start if it's at 800M now and do a rolling restart and monitor the GC events in the logs. – dkblinux98 Sep 01 '14 at 17:17