network-latency/ping degradation over time correlated with memory fragmentation

Question

There is an Online Charging System (OCS) which handles Diameter Ro and Gy traffic. The OCS receives CCR Diameter messages and answers with CCA Diameter messages. And this response time - latency - degrades over time.

Normal average latency is 20 ms, but in one week it degrades to 70 ms with increased frequency of spikes more than few seconds. The cluster consists of several physical hosts, on each physical host there are two VMs (Redhat KVM, RHEL) - Application (C++) VM and Database (Java) VM.

There is a strong correlation between latency degradation and memory fragmentation. For example, after cache drop + memory compaction or VM restart the latency becomes normal, but in one week it degrades to the numbers I wrote above.

If there is a VM backup, then latency degrades immediately after it.

Apart from Diameter latency degradation, there is a correlated ping RTT degradation with spikes up to 30 seconds. Ping RTT degradation between VMs on the same physical host is worse than between VMs on different hosts. There is no ping RTT degradation between physical hosts.

Any ideas how to fix this latency issue are welcome!

network-latency/ping degradation over time correlated with memory fragmentation

0 Answers0