I am running 3 DC with 10 nodes Cassandra 3.0.11 cluster.
I frequently see the following messages
WARN [Service Thread] 2021-02-10 14:03:10,219 GCInspector.java:282 - G1 Young Generation GC in 1317ms. G1 Eden Space: 4546625536 -> 0; G1 Old Gen: 22573336584 -> 25140250632; G1 Survivor Space: 1124073472 -> 721420288;
WARN [Service Thread] 2021-02-10 14:03:11,916 GCInspector.java:282 - G1 Young Generation GC in 1382ms. G1 Eden Space: 989855744 -> 0; G1 Old Gen: 25140250632 -> 26364987400; G1 Survivor Space: 721420288 -> 218103808;
WARN [Service Thread] 2021-02-10 14:03:49,801 GCInspector.java:282 - G1 Young Generation GC in 1072ms. G1 Eden Space: 4496293888 -> 0; G1 Old Gen: 17078798632 -> 19586992416; G1 Survivor Space: 620756992 -> 654311424;
WARN [Service Thread] 2021-02-10 14:03:51,471 GCInspector.java:282 - G1 Young Generation GC in 1336ms. G1 Eden Space: 1056964608 -> 0; G1 Old Gen: 19586992416 -> 20870449448; G1 Survivor Space: 654311424 -> 218103808;
WARN [Service Thread] 2021-02-10 14:04:42,262 GCInspector.java:282 - G1 Young Generation GC in 8909ms. G1 Eden Space: 1493172224 -> 0; G1 Old Gen: 32195070248 -> 34099284256;
WARN [Service Thread] 2021-02-10 14:04:44,990 GCInspector.java:282 - G1 Young Generation GC in 2520ms. G1 Old Gen: 34099284256 -> 34317388064; G1 Survivor Space: 218103808 -> 0;
WARN [Service Thread] 2021-02-10 14:04:47,245 GCInspector.java:282 - G1 Old Generation GC in 28836ms. G1 Old Gen: 34317388064 -> 11666582136; Metaspace: 49839232 -> 49835448
I am using G1GC with 32Gb of Heap. Due to this I am often seeing dropped mutation
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 1747789164 0 0
ViewMutationStage 0 0 0 0 0
ReadStage 0 0 12399767 0 0
RequestResponseStage 0 0 627930907 0 0
ReadRepairStage 0 0 60775 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 2101437 0 0
MemtableReclaimMemory 0 0 4381 0 0
PendingRangeCalculator 0 0 66 0 0
GossipStage 0 0 1350977 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 11394 0 0
MigrationStage 0 0 207917 0 0
MemtablePostFlush 0 0 3667 0 0
ValidationExecutor 0 0 0 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 2926 0 0
InternalResponseStage 0 0 420120 0 0
AntiEntropyStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 3 0 3503749628 0 12323589
Message type Dropped
READ 66919
RANGE_SLICE 8260
_TRACE 0
HINT 2208871
MUTATION 5207285
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 16491
PAGED_RANGE 0
READ_REPAIR 9
I have tried using sjk tool and most often I see sharedworker-pool
Monitoring threads ...
2021-02-10T14:01:27.672-0700 Process summary
process cpu=355.30%
application cpu=362.09% (user=322.78% sys=39.31%)
other: cpu=-6.79%
thread count: 823
heap allocation rate 1168mb/s
[000642] user=26.57% sys= 0.86% alloc= 119mb/s - SharedPool-Worker-10
[000647] user=23.41% sys= 0.93% alloc= 115mb/s - SharedPool-Worker-12
[000636] user=25.83% sys= 2.34% alloc= 111mb/s - SharedPool-Worker-4
[000634] user=20.25% sys= 0.27% alloc= 100mb/s - SharedPool-Worker-2
[000652] user=19.14% sys= 0.17% alloc= 99mb/s - SharedPool-Worker-19
[000648] user=19.14% sys= 0.19% alloc= 98mb/s - SharedPool-Worker-16
[000637] user=21.00% sys= 0.25% alloc= 94mb/s - SharedPool-Worker-5
[000633] user=12.82% sys= 2.51% alloc= 32mb/s - SharedPool-Worker-1
[000654] user= 7.25% sys= 0.76% alloc= 31mb/s - SharedPool-Worker-20
What is the best way to check what's causing the heap to fillip and causing GC?
update CPU Info
~]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 32
NUMA node(s): 32
Vendor ID: GenuineIntel
CPU family: 6
Model: 61
Model name: Intel Core Processor (Broadwell, IBRS)
Stepping: 2
CPU MHz: 2095.320
BogoMIPS: 4190.64
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
Total RAM
176GB
Clients
sudo netstat | grep 9042 | grep ESTABLISHED| wc -l
295