2

My golang application is deployed in a docker container with 1GB RAM, but over the course of time, the process throws OOM.

I thought it would be some memory leak issue, but after analyzing the heap profile of the process from pprof, something seems off.

### Time T1                 ### Time T2                 ### Time T3
## runtime.MemStats         ## runtime.MemStats         ## runtime.MemStats
# Sys = 293902584           # Sys = 432449784           # Sys = 570800376
# HeapAlloc = 47299656      # HeapAlloc = 63375376      # HeapAlloc = 68294696
# HeapSys = 263323648       # HeapSys = 397541376       # HeapSys = 531496960
# HeapIdle = 175882240      # HeapIdle = 297140224      # HeapIdle = 431710208
# HeapInuse = 87441408      # HeapInuse = 100401152     # HeapInuse = 99786752
# HeapReleased = 153509888  # HeapReleased = 297140224  # HeapReleased = 431677440

My understanding is as follows:

  • HeapAlloc (heap space being used by go process currently) is always less than 70 MBs
  • HeapSys is the space obtained from the OS. Should this value decrease when Heap is released by the GC?
  • I guess HeapInUse is also Okay in this case as it has some extra memory that is allocated to objects. This memory can be used by the process to allocate memory to new objects without asking from OS.
  • heapRelease - this field says that this much memory is released by the GC

In our case, the HeapSys keeps on growing till 1GB limit is reached and OOM is thrown.

My question is, shouldn't space from HeapSys be reduced after space is released by GC? Because this is not happening. The top command output is in sync with HeapSys and suggests almost equivalent (of HeapSys) memory is being used by the process. Or I am missing something here?

Note: We are using Go 1.13

Edit: Logs for OOM

2022-06-10T08:03:19.679701-06:00 myserver kernel: rbace-server invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
2022-06-10T08:03:19.679824-06:00 myserver kernel: CPU: 11 PID: 3622 Comm: rbace-server Kdump: loaded Tainted: G        W      ------------ T 3.10.0-1160.59.1.el7.x86_64 #1
2022-06-10T08:03:19.679865-06:00 myserver kernel: Hardware name: Dell Inc. PowerEdge R740xd/0DY2X0, BIOS 2.10.0 11/12/2020
2022-06-10T08:03:19.679905-06:00 myserver kernel: Call Trace:
2022-06-10T08:03:19.679945-06:00 myserver kernel: [<ffffffffa37865b9>] dump_stack+0x19/0x1b
2022-06-10T08:03:19.679986-06:00 myserver kernel: [<ffffffffa3781658>] dump_header+0x90/0x229
2022-06-10T08:03:19.680027-06:00 myserver kernel: [<ffffffffa329da38>] ? ep_poll_callback+0xf8/0x220
2022-06-10T08:03:19.680068-06:00 myserver kernel: [<ffffffffa31c1fe6>] ? find_lock_task_mm+0x56/0xc0
2022-06-10T08:03:19.680108-06:00 myserver kernel: [<ffffffffa323d2d8>] ? try_get_mem_cgroup_from_mm+0x28/0x60
2022-06-10T08:03:19.680146-06:00 myserver kernel: [<ffffffffa31c254d>] oom_kill_process+0x2cd/0x490
2022-06-10T08:03:19.680185-06:00 myserver kernel: [<ffffffffa32416cc>] mem_cgroup_oom_synchronize+0x55c/0x590
2022-06-10T08:03:19.680222-06:00 myserver kernel: [<ffffffffa3240b30>] ? mem_cgroup_charge_common+0xc0/0xc0
2022-06-10T08:03:19.680261-06:00 myserver kernel: [<ffffffffa31c2e34>] pagefault_out_of_memory+0x14/0x90
2022-06-10T08:03:19.680301-06:00 myserver kernel: [<ffffffffa377fb95>] mm_fault_error+0x6a/0x157
2022-06-10T08:03:19.680337-06:00 myserver kernel: [<ffffffffa37948d1>] __do_page_fault+0x491/0x500
2022-06-10T08:03:19.680379-06:00 myserver kernel: [<ffffffffa3794975>] do_page_fault+0x35/0x90
2022-06-10T08:03:19.680420-06:00 myserver kernel: [<ffffffffa3790778>] page_fault+0x28/0x30
2022-06-10T08:03:19.680717-06:00 myserver kernel: Memory cgroup out of memory: Kill process 85344 (rbace-server) score 1035 or sacrifice child
2022-06-10T08:03:19.680770-06:00 myserver kernel: Killed process 3518 (rbace-server), UID 0, total-vm:2506864kB, anon-rss:1115028kB, file-rss:0kB, shmem-rss:0kB
2022-06-10T08:03:19.906133-06:00 myserver kernel: XFS (dm-46): Unmounting Filesystem
2022-06-10T08:03:19.939645-06:00 myserver kernel: device-mapper: ioctl: remove_all left 100 open device(s)
codingenious
  • 8,385
  • 12
  • 60
  • 90
  • "shouldn't space from HeapSys be reduced after space is released by GC?" Nope. It will be reduced after the OS reclaims the memory. The OOM error is odd though; is that a Go allocation error? If so, what is it trying to allocate when it fails? – Adrian Jun 22 '22 at 20:36
  • @Adrian , there is no Go allocation error in the logs. – codingenious Jun 23 '22 at 07:51
  • Then where are you seeing an OOM error? – Adrian Jun 23 '22 at 14:14
  • @Adrian Sorry, I misinterpreted your message. I have attached the logs when we get OOM – codingenious Jun 23 '22 at 17:33
  • That's oom-killer doing that, not your application. The process isn't throwing OOM at all. oom-killer is just a process that sigkills processes based on memory usage. It knows nothing about garbage collection. – Adrian Jun 23 '22 at 17:36
  • Total OS memory is 375 GB and around 200 GB is free. The docker limit is 1GB. This limit is being hit by the go app. Could this be a reason oom-killer is sigkilling the process? – codingenious Jun 23 '22 at 17:57

0 Answers0