1

I want to understand what is the current memory usage for my pod running on EKS Cluster. I have Metrics server and prometheus installed.

When I run a "kubectl top pods", I get a memory usage of 2.5 GB

sh-4.2$ kubectl top pods liink-goquorum-node-learner-quorum-0 -n my-namespace
NAME                                   CPU(cores)   MEMORY(bytes)
liink-goquorum-node-learner-quorum-0   128m         2577Mi

The metrics server documentation advises against using this as an accurate source of metrics.

When I go inside the pod and execute the command, my memory usage is vastly different.

root@container[liink-goquorum-node-learner-quorum-0]:/sys/fs/cgroup/memory# cat /sys/fs/cgroup/memory/memory.usage_in_bytes
8921739264

When I checked in Prometheus (as advised in metrics server github page), my memory utilization is closer to the one that I am getting in the system.

container_memory_usage_bytes{cluster="liink-uat",namespace="<my-namespace>",container="",pod="liink-goquorum-node-learner-quorum-0"} => 8923037696

However, when I run the below query, the value is closer to metrics server.

container_memory_working_set_bytes{cluster="liink-uat",namespace="<my-namespace>",container="",pod="liink-goquorum-node-learner-quorum-0"}  => 2704633856 

So which one is the best source to get the current memory usage for the pod? Also if metrics server can't be trusted for accurate usage, then are the auto scaling decisions based on this wrong? What am I missing here?

Below are the explanation of these metrics from the cadvisor documentation and it is not very clear to me from this.

container_memory_usage_bytes: Current memory usage, including all memory regardless of when it was accessed

container_memory_working_set_bytes: Current working set

Also pasting the cgroups explanation for the memory usage. https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt which lets me believe that the current container utilization is closer to what I am getting by running the cat command.

Can someone help me understand what is the correct way to get the current memory utilization and if i am missing something here.

Gaurav Parashar
  • 1,347
  • 2
  • 19
  • 21

2 Answers2

0

Both ways are correct, they just provide information about different utilization categories.

My understanding:

  • container_memory_working_set_bytes includes memory allocated to your application. This is critical for application functioning, and thus this metric is monitored most of the time and has the most interest. OOM and autoscaling are based on its value.
  • container_memory_usage_bytes in addition to memory allocated to your application also includes all kinds of memory used by pod: filesystem caching, shared memory, that kind of stuff. Theoretically this memory is not critical and can be freed if needed.

I believe container_memory_usage_bytes is closer to "current memory usage for the pod". But you should remember that it is simply not the same as memory usage of application in pod.

markalex
  • 8,623
  • 2
  • 7
  • 32
  • This would make sense but can we control this allotment. Going by your explanation, my application only has 2.5GB memory available and the pod has 8 GB. Can we increase the container_memory_working_set_bytes somehow perhaps by some configuration? How can I make the application consume the entire available memory? – Gaurav Parashar Aug 30 '23 at 01:24
0

The container_memory_working_set_bytes shows the amounts of memory the container recently accessed. If the container_memory_working_set_bytes reaches the configured memory limit, then the container is killed with OOM killer, since it cannot continue working efficiently without additional memory.

As I understand, the container_memory_usage_bytes shows the total memory the container ever allocated or requested. This memory can never be touched (for example, if the container pre-allocates memory and never uses it), or it could be accessed only once (for example, if there is a memory leak). So the container_memory_usage_bytes may significantly exceed the container_memory_working_set_bytes.

If the container_memory_usage_bytes grows over time at a constant rate, and its value exceeds the container_memory_working_set_bytes, then there are high chances that the application has a memory leak. If the container_memory_usage_bytes remains constant over time, then everything is OK, even if its' value exceeds the container_memory_working_set_bytes.

valyala
  • 11,669
  • 1
  • 59
  • 62