1

On May 3rd we deployed. Our 3 node cassandra cluster became extremely slow and many web requests were timing out. By EOD May 3rd we launched another m1.large machine to our cluster which resolved the timeouts. That being said, the cluster was still extremely slow; on May 4th we launched five i3.xLarge nodes. This helped our application response time considerably, on May 5th we removed the old m1.large boxes from the cluster. As of EOD May 5th everything was quick and responsive. This morning, the application began timing out again.

We have noticed some odd CPU utilization behaviour - the CPU usage fluctuates between 100% and 200% regardless of load (they are four core machines). We have extremely light weekends with absolutely no load and relatively heavy monday loads but we are seeing absolutely no change in CPU usage.

As you can see in the 2 week graph below, our database CPU usage was once bound to application usage. You can see the large spike in the 3rd, the introduction of the new machines an the 4th, and the stable high CPU usage starting on the 6th.

Database CPU Utilization

We have spent a good amount of time trying to identify the cause for the CPU usage and were able to identify (and subsequently rule out) three main reasons:

  1. High khugepaged CPU usage.
  2. Poorly tuned garbage collection
  3. Poorly tuned compactions

We have ruled out all three of these things.

  1. Our servers have 0.0% khugepaged CPU usage.
  2. Our GC throughput is about 96%. We have also tweaked the heap and new heap sizes as well as switching over to G1 GC. Our logs were once showing warnings related to long GC pauses but no longer do. Also, the GC threads only account for a small amount of CPU usage.
  3. nodetool compactionstats returns 0 pending tasks. We have switched over to LeveledCompactionStrategy and set the GC_GRACE_SECONDS to 1 day. Our logs were once showing warnings related to large numbers of tombstones but no longer do. nodetool compactionhistory shows about one compaction per hour and according to the logs they occur extremely quickly (< 1 second).

It appears that Cassandra's SharedPoolWorker threads have very high usage. Here's one node's CPU usage by type of thread (they all look pretty similar):

84.6 SharedPoolWorker
22.1 Thrift
13.5 CCompilerThread
11.9 MessagingServiceOutgoing
9.4  MessagingServiceIncoming
3.6  GangworkerParallelGCThreads
1.6  DestroyJavaVM
.3   VMThread
.1   Thread
.1   ScheduledTasks
.1   OptionalTasks
0    ...

Checking out the state of the SharedPool-Worker threads show that the vast majority are in WAITING with the following stack trace:

java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(Unknown Source)
    at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:85)
    at java.lang.Thread.run(Unknown Source)

I think this is the problem but I am not sure why this may be as very little CPU time is spent waiting (consistently 0% as per dstat).

Now, interestingly enough, running nodetool tpstats on any given node shows a small number of ReadStage threads in active, occasionally one or two in pending. There are none blocked, all time blocked, or dropped.

Here's the output to nodetool cfstats and here's nodetool netstats:

Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 12229
Mismatch (Blocking): 2
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed   Dropped
Commands                        n/a         0         707576         0
Responses                       n/a         0         859216       n/a

Does anyone have any ideas about why this may be occurring? Any potential things that we can look into?

cscan
  • 3,684
  • 9
  • 45
  • 83

2 Answers2

1

It can be related to high number of tombstones or high number of sstables scanned for single read - that creates constant high CPU load and slow responses due to high amount of reads it needs to do for every request.

These symptoms can show, for example, using STCS with constantly and frequently updated (updating rows, not adding new ones) data.

Can you add nodetool tablestats/cfstats of your main tables to the question?

nevsv
  • 2,448
  • 1
  • 14
  • 21
  • I have updated the question to include `nodetool cfstats` and `nodetool netstats`. It looks like `nodetool tablestats` was first release along with cassandra 3. We also switched over to a LeveledCompactionStrategy for that very reason. – cscan May 09 '17 at 17:29
  • Going to LCS should be a good choice. If data updated very frequently consider setting gc_grace to lower value. About cfststs - table edgestore looks problematic with high number of tombstones read per slice. – nevsv May 10 '17 at 07:31
0

The problem was actually our API. It was having GC problems which led to a large number of db read/write threads being frozen.

cscan
  • 3,684
  • 9
  • 45
  • 83