I'm investigating an OOM kill of one of services in my kubernetes cluster.
One of the first things I did was to look at a grafana plot of memory usage of the pod that was killed and to my surprise I saw a big spike in memory usage just around the OOM happened. However, the pod itself wasn't running any computations at that time so such a spike was very confusing (the pod itself has a memory leak which made it slowly reach memory limit but the spike was not part of it).
I've looked closer into the spike using Prometheus and I've learnt that for a short moment, when old pod is being replaced by a new one, kubelet
reports metrics from both pods, which then due to sum by (container)
used in grafana plot shows a spike.
Without aggregation: (container_memory_rss{pod="XXX", container!="POD", container!=""}
)
With aggregation: (sum by(container) (container_memory_rss{pod="XXX", container!="POD", container!=""})
)
I was wondering if the overlap can be avoided somehow and whether it stems from some incorrect configuration on my side or either it's a behaviour of kubelet
or prometheus
that cannot be avoided.
Thanks!