1

This is a situation for the company that I work for, that's causing to loose my sleep, since I'm in front of the wall and finding no options. I have a production cluster of 51 cassandra 3.11.9 nodes, that works with a large load (from 600 to 800 GB each node), that is presenting a very annoying issue: For an unknown reason, there machines start doing GC pauses, that really hurt the response time of my application, since the consistency level on clients is ONE,creating peaks of latency.

Here are some samples of:

28 | CHANGED | rc=0 >>
INFO  [Service Thread] 2023-07-14 01:56:45,583 GCInspector.java:285 - G1 Old Generation GC in 12039ms.  G1 Old Gen: 54648149088 -> 12146746552; G1 Survivor Space: 134217728 -> 0;
INFO  [Service Thread] 2023-07-14 02:14:24,394 GCInspector.java:285 - G1 Old Generation GC in 57918ms.  G1 Old Gen: 67780410096 -> 59704216816; Metaspace: 61436792 -> 61302392
INFO  [Service Thread] 2023-07-14 02:15:44,506 GCInspector.java:285 - G1 Old Generation GC in 64576ms.  G1 Old Gen: 67971190408 -> 64736391536;
INFO  [Service Thread] 2023-07-14 02:17:06,520 GCInspector.java:285 - G1 Old Generation GC in 66242ms.  G1 Old Gen: 68043573704 -> 66792790424;
INFO  [Service Thread] 2023-07-14 02:21:31,210 GCInspector.java:285 - G1 Old Generation GC in 257268ms.  G1 Old Gen: 68046631936 -> 67703054448;

254 | CHANGED | rc=0 >>
INFO  [Service Thread] 2023-07-14 02:36:26,170 GCInspector.java:285 - G1 Old Generation GC in 46654ms.  G1 Old Gen: 133621345752 -> 49403423024; Metaspace: 67436096 -> 67339688
INFO  [Service Thread] 2023-07-14 02:38:58,627 GCInspector.java:285 - G1 Old Generation GC in 89392ms.  G1 Old Gen: 133594285096 -> 103157948104;
INFO  [Service Thread] 2023-07-14 02:40:59,754 GCInspector.java:285 - G1 Old Generation GC in 93345ms.  G1 Old Gen: 135071359720 -> 105377369048; G1 Survivor Space: 33554432 -> 0;
INFO  [Service Thread] 2023-07-14 02:43:29,171 GCInspector.java:285 - G1 Old Generation GC in 106174ms.  G1 Old Gen: 133812654600 -> 119264140552; G1 Survivor Space: 234881024 -> 0;
INFO  [Service Thread] 2023-07-14 02:45:36,900 GCInspector.java:285 - G1 Old Generation GC in 95625ms.  G1 Old Gen: 135225564784 -> 99943593104;
INFO  [Service Thread] 2023-07-14 02:46:53,820 GCInspector.java:285 - G1 Old Generation GC in 55875ms.  G1 Old Gen: 133359614104 -> 60924511688; G1 Survivor Space: 872415232 -> 0;
INFO  [Service Thread] 2023-07-14 02:48:22,803 GCInspector.java:285 - G1 Old Generation GC in 38493ms.  G1 Old Gen: 133978126912 -> 36277631424;
INFO  [Service Thread] 2023-07-14 02:50:11,320 GCInspector.java:285 - G1 Old Generation GC in 34789ms.  G1 Old Gen: 134004918888 -> 35377344368;

250 | CHANGED | rc=0 >>
INFO  [Service Thread] 2023-07-14 00:18:52,262 GCInspector.java:285 - G1 Old Generation GC in 96017ms.  G1 Old Gen: 73628910144 -> 59159105432; Metaspace: 58018496 -> 57907432
INFO  [Service Thread] 2023-07-14 00:46:41,400 GCInspector.java:285 - G1 Old Generation GC in 30177ms.  G1 Old Gen: 41448088568 -> 24094354384; G1 Survivor Space: 67108864 -> 0;
INFO  [Service Thread] 2023-07-14 02:18:34,910 GCInspector.java:285 - G1 Old Generation GC in 40940ms.  G1 Old Gen: 74016882928 -> 27759131352; Metaspace: 57315192 -> 57128720
INFO  [Service Thread] 2023-07-14 02:36:02,256 GCInspector.java:285 - G1 Old Generation GC in 57658ms.  G1 Old Gen: 73488401080 -> 40838191112; Metaspace: 54701984 -> 54651552
INFO  [Service Thread] 2023-07-14 02:37:47,374 GCInspector.java:285 - G1 Old Generation GC in 87036ms.  G1 Old Gen: 73498188264 -> 65920831896;
INFO  [Service Thread] 2023-07-14 02:39:58,921 GCInspector.java:285 - G1 Old Generation GC in 111435ms.  G1 Old Gen: 73496794000 -> 70079092144;

In the latest months I've tried several things, like:

  • Increasing the instance type (increasing the JVM), but the error just get more time to happens, but happens anyway.
  • Removing nodes that presenting this issue, but new nodes started having the problem
  • Using G1GC and different JVM vendors, like the one from AZUL

Currently, I don't have the following options:

  • Changing the wiring or the data modeling, since that's depends on other teams.
  • Updating Cassandra to 4 version, the application needs to be updated first.

Right now, the only resource that I have is using "disablebinary" on these nodes for avoiding peaks, but that it's not good.

The JVM

INFO  [main] 2023-07-17 18:40:11,668 CassandraDaemon.java:507 - JVM Arguments: [-javaagent:/opt/simility/include/exporters/jmxexporter/jmx_prometheus_javaagent-0.12.0.jar=7070:/opt/simility/include/exporters/jmxexporter/cassandra.yml, -ea, -javaagent:/opt/simility/include/cassandra/lib/jamm-0.3.0.jar, -XX:+UseThreadPriorities, -XX:ThreadPriorityPolicy=42, -Xms64G, -Xmx64G, -XX:+ExitOnOutOfMemoryError, -Xss256k, -XX:StringTableSize=1000003, -XX:+UseG1GC, -XX:G1RSetUpdatingPauseTimePercent=5, -XX:MaxGCPauseMillis=500, -Djava.net.preferIPv4Stack=true, -Dcassandra.config=file:///opt/simility/conf/cassandra/cassandra.yaml, -Djava.rmi.server.hostname=172.33.135.28, -Dcom.sun.management.jmxremote.port=7199, -Dcom.sun.management.jmxremote.rmi.port=7199, -Dcom.sun.management.jmxremote.ssl=false, -Dcom.sun.management.jmxremote.authenticate=false, -Dcassandra.libjemalloc=/lib64/libjemalloc.so.1, -Dlogback.configurationFile=logback.xml, -Dcassandra.config=file:///opt/simility/conf/cassandra/cassandra.yaml, -Dcassandra.logdir=/opt/simility/log/cassandra, -Dcassandra.storagedir=/opt/simility/include/cassandra/data]

Thank you!

several things, as explained in the last message

  • Adding the JVM settings being used to the question would help, the heap looks to be set too high but we would want to see the JVM settings being used before providing recommendations. – Andrew Jul 14 '23 at 09:15
  • Hi Andrew, have just added the JVM settings to the post. Have found several scenarios that create this. But even using from 60% to 95% of the RAM , all have let us to this scenario. But how much more memory, more time to the GC to happen. Once, we put 190 GB as XMX and took days for the gc happen, but it happened again. – Gabriel Dias Jul 17 '23 at 20:25
  • I wouldn't put the Heap above 32765M or 31G for convenience to ensure COOPS is enabled, and make sure you have parallelgcthreads / concgcthreds set to the core count when using G1GC, so that it uses all the CPU to complete the GC faster. – Andrew Jul 18 '23 at 12:58
  • @Andrew, one of the machines, following this, I just set the following parameters for one of the machines that is presenting the issue: MAX_HEAP_SIZE="31G" HEAP_NEWSIZE="8G JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=32" JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=8 It worths mentioning that it previously had 128gb of heap (it has 244 gb of ram and 32 cores) and no parameter for gcthreads. I will follow it and feedback the post, later. – Gabriel Dias Jul 20 '23 at 00:50
  • I would set both ParallelGCThreads and ConcGCThreads to be the same value, e.g. core count of 32 if its a 32 core machine. – Andrew Jul 20 '23 at 15:19

1 Answers1

2

The number one main reason for this type of behavior is humongous allocations. I wrote up a paper on it here:

https://medium.com/@stevenlacerda/identifying-and-fixing-humongous-allocations-in-cassandra-bf46444cec41

You can figure out if the issue is humongous allocations by adding the following to your jvm settings:

-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps 
-XX:+PrintHeapAtGC 
-XX:+PrintTenuringDistribution 
-XX:+PrintGCApplicationStoppedTime 
-XX:+PrintPromotionFailure 
-XX:PrintFLSStatistics=1
-Xloggc:<file-path>
-XX:+PrintAdaptiveSizePolicy

Once you have the gc logs, you can look for humongous allocations in the gc logs:

14999:  6153.047: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 18253611008 bytes, allocation request: 4299032 bytes, threshold: 9663676380 bytes (45.00 %), source: concurrent humongous allocation]

That's a 4.2MiB allocation. In this case, the heap region size was 8MiB, so a humongous allocation is anything larger than half the region size. And the region size is computed as:

Heap size in MiB / 2048 = XMiB rounded down to the nearest multiple of 2, ie. 2, 4, 8, 16, 32…where 32 is max).

You can also see the heap region size in your gc logs:

14.414: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: requested by GC cause, GC cause: Metadata GC Threshold]
{Heap before GC invocations=0 (full 0):
 garbage-first heap   total 20971520K, used 212992K [0x00000002c0000000, 0x00000002c0805000, 0x00000007c0000000)
  region size 8192K, 27 young (221184K), 0 survivors (0K)

In this case it's 8192K.

If you do see humongous allocations, then what you can do is either increase the heap size, -Xmx parameter, or you can increase the region size:

-XX:G1HeapRegionSize=32M

32M is the max.

Ideally, that's not the solution, the solution is to get those mutation sizes down, but that should buy you some leeway. If it doesn't then you may have a situation where you have one object that's causing you headaches. I've seen that before as well. To identify that issue, you'll need to get a memory dump just before the issue hits, which is not easy. In an ideal world, you'll have HeapDumpOnOutOfMemory set and it'll OOM and then you'll have a heap dump to analyze with Eclipse MAT.

stevenlacerda
  • 341
  • 1
  • 4
  • Nice answer! And the article on Medium does a great job of explaining what's happening. – Aaron Jul 14 '23 at 16:32
  • Nice Steve, thanks for the feedback. I' put these ones in some machine that's presenting the issue and get update you until Wednesday. – Gabriel Dias Jul 17 '23 at 20:28
  • Hi Had a chance to enable the gc and get the information that you highlighted. Had several entries of humongous allocations like the one that you highlighted: 111940.481: [G1Ergonomics (Concurrent Cycles) do not request concurrent cycle initiation, reason: still doing mixed collections, occupancy: 39191576576 bytes, allocation request: 2147483664 bytes, threshold: 30923764515 bytes (45.00 %), source: concurrent humongous allocation] Somehow, our regionsize was already set to 32K: region size 32768K, 102 young (3342336K), 18 survivors (589824K). Continuing in the next comment. – Gabriel Dias Jul 20 '23 at 01:12
  • So, do you have some article or guide, in order to do this path, going after the object that is causing this issue? Yes, this dump path, may look a little bit hard, but we may take a shot on that. It makes sense, since our customer's data model is very unoptimized and we are pretty sure that there are some huge objects/data that is causing bad functioning on cassandra. – Gabriel Dias Jul 20 '23 at 01:15