0

I am running an openstack Victoria with Kolla ansible deployment , all components are containerised .

The compute node is (oom_kill) killing guest when the memory is max out , is there a way to avoid it like in other hypervisors it works fine without this issue . I am using Centos 8.3 . Please let me know if there is a way to avoid this .

Errors :

**Feb 27 12:18:15 server1 kernel: neutron-openvsw invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Feb 27 12:18:15 server1 kernel: oom_kill_process.cold.28+0xb/0x10
Feb 27 12:18:15 server1 kernel: [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Feb 27 12:18:15 server1 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=395bde13c7e0570ef36df008bc028d8701fd76c1b56e2a56afaf254fd53d0043,mems_allowed=0-1,global_oom,task_memcg=/machine/qemu-33-instance-000000dc.libvirt-qemu,task=qemu-kvm,pid=2301214,uid=42436
Feb 27 12:18:17 server1 kernel: oom_reaper: reaped process 2301214 (qemu-kvm), now anon-rss:0kB, file-rss:516kB, shmem-rss:0kB**

sar memory utilisation

==================================
10:10:05 AM kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
12:00:05 PM    877228         0 393690660     99.78         0    500284 2254123104    542.46 374227828  12705256         0
12:10:04 PM    866416         0 393701472     99.78         0    501844 2254259520    542.49 374233440  12704360         0
12:20:04 PM 301182096 300028052  93385792     23.67         0    705140 1938778932    466.57  83794716   5028804         8
12:30:04 PM 301085624 299970968  93482264     23.69         0    779220 1939000988    
Bob
  • 5,805
  • 7
  • 25
  • This sounds like [this thread](http://lists.openstack.org/pipermail/openstack-discuss/2022-February/027350.html) in the openstack-discuss mailing list. – eblock Mar 01 '22 at 08:44

1 Answers1

0

Answering my own questioun as I found a resolution .

The oom kills were happening even when free stats were looking good like on a 256G RAM only 140G was used and still around 100G shows up as free .

[root@serverxx ~]# free -g total used free shared buff/cache available Mem: 251 140 108 0 2 108 Swap: 19 6 13

oom kills were triggered by high %commit in the sar stats where the kernel starts targetting instances with high memory footprint to free up .

To avoid oom kills for the guest instances with higher memory footprints , I set the following . vm.oom_kill_allocating_task=1

When I did a sar -r the %commit was way higher than the system can allocate and I figured from ps that it was a cinder-backup container that was created by default from kolla-ansible deployments but was not configured .

Cinder backup service stats that I didn't configure and it was just running , it turned out that the unconfigured container was taking up all the memory overtime as seen from the output of ps command in the vsz .

ps -eo args,comm,pid,ppid,rss,vsz --sort vsz column

VSZ is extremely high

COMMAND                     COMMAND             PID    PPID   RSS    VSZ /usr/libexec/qemu-kvm -name qemu-kvm        1916998   47324 8094744 13747664 /var/lib/kolla/venv/bin/pyt cinder-backup     43689   43544 170999912 870274784

Sar stats for % commit coming back to normal after the backup container was stopped and now everything is back to normal . %commit highlighted from 1083.46 to 14.21 after the changes .

02:00:37 PM kbmemfree kbavail kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty 03:00:37 PM 48843576 49998184 82890508 62.92 9576 5949348 1427280428 1083.46 75646888 2797388 324 03:10:37 PM 48829248 49991284 82904836 62.93 9576 5956544 1427343664 1083.50 75653556 2804592 116 03:20:22 PM 120198612 121445516 11535472 8.76 9576 6042892 18733688 14.22 4887688 2854704 80 03:30:37 PM 120189464 121444176 11544620 8.76 9576 6050200 18725820 14.21 4887752 2862248 88