0

We are running a medium size AWS EKS cluster (~120 kubelet nodes) running mostly Go services. The services deployed in the cluster are quite busy, handling millions of calls per hour. Each kubelet runs on the same version of the standard Amazon Linux

Linux 4.14.203-156.332.amzn2.x86_64 #1 SMP Fri Oct 30 19:19:33 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Some time ago we had noticed in our Grafana dashboards that on each kubelet node TCP mem (bytes) steadily grows over time without ever dropping.

enter image description here

We managed to pin this issue to a single, but a rather "large" (in terms of the size of the codebase) Go service. We now recycle this service regularly whilst looking for the cause of the leak.

I'm now starting to question if I understand this issue correctly from the host i.e. Linux Kernel PoV and would like to avoid following a mirage.

My understanding as of now is, the TCP memory bytes leak can be either on the receiving or sending side of things. I suspect these are bytes allocated for a socket (somewhere in Kernel) which remains open indefinitely with the data being queued somewhere without being "drained". Is this correct or am I fooling myself here?

If it is, is there a way I can inspect the data somehow? By "inspecting" I mean find the sockets holding this data.

Chasing open sockets by running lsof on the host side of things has now led to many leads I could follow up on, but one thing I have noticed is, there are a lot of sockets "inside" the service Pod in TIME_WAIT state, which I believe should not be much of a concern, though just to make sure I'm not missing anything I did drop the tcp fin_timeout to much lower value than what the default settings were (60s -> 10s) to recycle sockets faster.

Now, I understand this is our service leaking the memory, but I'm looking for some answers about the following questions:

  • is my thinking about this problem from the Kernel PoV correct i.e. would the open sockets/FDs which havent got their buffers cleared (read/write) be the cause of this?
  • if the answer to the above is yes, is there any way to tell, on the busy server, or to pinpoint which of these sockets have the buffers allocated but not cleared and on which end (send/recv)

Thanks

milosgajdos
  • 1,828
  • 2
  • 21
  • 30
  • Do you also control the clients calling to that service or is this more of a public HTTP kind of service? – Ginnungagap Dec 01 '20 at 08:07
  • It is both, actually. There is a public API, indeed, but we also have our own clients which interact with the API. – milosgajdos Dec 01 '20 at 09:57
  • For inspecting sockets, using an appropriate tool like `ss` (try `ss -aemnpt`) will give you the state and buffered data size for each socket. TIME_WAIT sockets are normal and should consume fairly little memory. – Ginnungagap Dec 02 '20 at 07:46
  • I have been so far parsing the output from `ss -tapmi` but `ss` developers certainly have not made life easier for people with mad debugging requirements. I'm more wondering about the underlying socket data management. Is my thinking about it correct? Is the reason for this socket buffers filling up in Kernel space? – milosgajdos Dec 02 '20 at 12:04

0 Answers0