First, the log:
[Wed aug 22 01:11:31 2018] java invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
[Wed aug 22 01:11:31 2018] java cpuset=/ mems_allowed=0-1
[Wed aug 22 01:11:31 2018] CPU: 5 PID: 4726 Comm: java Tainted: G I 4.13.0-32-generic #35~16.04.1-Ubuntu
[Wed aug 22 01:11:31 2018] Hardware name: HP ProLiant DL380 G7, BIOS P67 05/14/2010
[Wed aug 22 01:11:31 2018] Call Trace:
[Wed aug 22 01:11:31 2018] dump_stack+0x63/0x8b
[Wed aug 22 01:11:31 2018] dump_header+0x97/0x225
[Wed aug 22 01:11:31 2018] ? security_capable_noaudit+0x45/0x60
[Wed aug 22 01:11:31 2018] oom_kill_process+0x219/0x420
[Wed aug 22 01:11:31 2018] out_of_memory+0x11d/0x4b0
[Wed aug 22 01:11:31 2018] __alloc_pages_slowpath+0xd2e/0xe10
[Wed aug 22 01:11:31 2018] ? apic_timer_interrupt+0x101/0x1c0
[Wed aug 22 01:11:31 2018] ? apic_timer_interrupt+0xd7/0x1c0
[Wed aug 22 01:11:31 2018] __alloc_pages_nodemask+0x260/0x280
[Wed aug 22 01:11:31 2018] alloc_pages_vma+0x88/0x1e0
[Wed aug 22 01:11:31 2018] __handle_mm_fault+0xc7b/0x1070
[Wed aug 22 01:11:31 2018] handle_mm_fault+0xcc/0x1c0
[Wed aug 22 01:11:31 2018] __do_page_fault+0x258/0x4f0
[Wed aug 22 01:11:31 2018] do_page_fault+0x22/0x30
[Wed aug 22 01:11:31 2018] ? page_fault+0x36/0x60
[Wed aug 22 01:11:31 2018] page_fault+0x4c/0x60
[Wed aug 22 01:11:31 2018] RIP: 0033:0x7f2bc96dd62e
[Wed aug 22 01:11:31 2018] RSP: 002b:00007f2b4f8f7450 EFLAGS: 00010246
[Wed aug 22 01:11:31 2018] RAX: 0000000000000000 RBX: 00000006d7fe91d0 RCX: 00000000003e81aa
[Wed aug 22 01:11:31 2018] RDX: 0000000005bbe6da RSI: 00000006d7fe91e0 RDI: 00000006ecfa2000
[Wed aug 22 01:11:31 2018] RBP: 00000000daffd23a R08: 000000000b77cdc0 R09: 0000000005bbe6da
[Wed aug 22 01:11:31 2018] R10: 00000000016ef9b8 R11: 000000074e1015d8 R12: 0000000000000000
[Wed aug 22 01:11:31 2018] R13: 0000000005bbe6da R14: 0000000647817e70 R15: 00007f2bd8162800
[Wed aug 22 01:11:31 2018] Mem-Info:
[Wed aug 22 01:11:31 2018] active_anon:5272319 inactive_anon:753357 isolated_anon:0
active_file:201 inactive_file:275 isolated_file:0
unevictable:1202 dirty:17 writeback:0 unstable:0
slab_reclaimable:17005 slab_unreclaimable:39032
mapped:720524 shmem:783543 pagetables:15557 bounce:0
free:35825 free_pcp:11 free_cma:0
[Wed aug 22 01:11:31 2018] Node 0 active_anon:11964008kB inactive_anon:90592kB active_file:32kB inactive_file:28kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:40904kB dirty:0kB writeback:0kB shmem:120956kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Wed aug 22 01:11:31 2018] Node 1 active_anon:9125268kB inactive_anon:2922836kB active_file:772kB inactive_file:1072kB unevictable:4808kB isolated(anon):0kB isolated(file):0kB mapped:2841192kB dirty:68kB writeback:0kB shmem:3013216kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Wed aug 22 01:11:31 2018] Node 0 DMA free:15892kB min:56kB low:68kB high:80kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed aug 22 01:11:31 2018] lowmem_reserve[]: 0 3494 12003 12003 12003
[Wed aug 22 01:11:31 2018] Node 0 DMA32 free:47848kB min:12928kB low:16460kB high:19992kB active_anon:3483584kB inactive_anon:12368kB active_file:16kB inactive_file:0kB unevictable:0kB writepending:0kB present:3643584kB managed:3578016kB mlocked:0kB kernel_stack:496kB pagetables:3816kB bounce:0kB free_pcp:4kB local_pcp:4kB free_cma:0kB
[Wed aug 22 01:11:31 2018] lowmem_reserve[]: 0 0 8509 8509 8509
[Wed aug 22 01:11:31 2018] Node 0 Normal free:32604kB min:31892kB low:40604kB high:49316kB active_anon:8480424kB inactive_anon:78224kB active_file:16kB inactive_file:28kB unevictable:0kB writepending:0kB present:8912896kB managed:8713220kB mlocked:0kB kernel_stack:4168kB pagetables:12244kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed aug 22 01:11:31 2018] lowmem_reserve[]: 0 0 0 0 0
[Wed aug 22 01:11:31 2018] Node 1 Normal free:46956kB min:45228kB low:57584kB high:69940kB active_anon:9124728kB inactive_anon:2922836kB active_file:772kB inactive_file:1072kB unevictable:4808kB writepending:68kB present:12582908kB managed:12361464kB mlocked:4808kB kernel_stack:8456kB pagetables:46168kB bounce:0kB free_pcp:112kB local_pcp:0kB free_cma:0kB
[Wed aug 22 01:11:31 2018] lowmem_reserve[]: 0 0 0 0 0
[Wed aug 22 01:11:31 2018] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
[Wed aug 22 01:11:31 2018] Node 0 DMA32: 515*4kB (UMEH) 240*8kB (UMEH) 177*16kB (UMEH) 205*32kB (UMEH) 115*64kB (UMEH) 72*128kB (UEH) 34*256kB (UE) 16*512kB (UME) 1*1024kB (U) 0*2048kB 0*4096kB = 47868kB
[Wed aug 22 01:11:31 2018] Node 0 Normal: 1502*4kB (UMEH) 1712*8kB (UMEH) 359*16kB (UMEH) 122*32kB (UMEH) 36*64kB (UMEH) 7*128kB (M) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 32808kB
[Wed aug 22 01:11:31 2018] Node 1 Normal: 649*4kB (UME) 654*8kB (UMEH) 1047*16kB (UMEH) 473*32kB (UMEH) 86*64kB (UMH) 2*128kB (H) 2*256kB (H) 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 46500kB
[Wed aug 22 01:11:31 2018] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Wed aug 22 01:11:31 2018] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Wed aug 22 01:11:31 2018] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Wed aug 22 01:11:31 2018] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Wed aug 22 01:11:31 2018] 785008 total pagecache pages
[Wed aug 22 01:11:31 2018] 0 pages in swap cache
[Wed aug 22 01:11:31 2018] Swap cache stats: add 0, delete 0, find 0/0
[Wed aug 22 01:11:31 2018] Free swap = 0kB
[Wed aug 22 01:11:31 2018] Total swap = 0kB
[Wed aug 22 01:11:31 2018] 6288845 pages RAM
[Wed aug 22 01:11:31 2018] 0 pages HighMem/MovableOnly
[Wed aug 22 01:11:31 2018] 121693 pages reserved
[Wed aug 22 01:11:31 2018] 0 pages cma reserved
[Wed aug 22 01:11:31 2018] 0 pages hwpoisoned
[Wed aug 22 01:11:31 2018] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[Wed aug 22 01:11:31 2018] [ 407] 0 407 10891 3826 26 3 0 0 systemd-journal
[Wed aug 22 01:11:31 2018] [ 421] 0 421 25742 66 18 3 0 0 lvmetad
[Wed aug 22 01:11:31 2018] [ 872] 0 872 1099 20 8 3 0 0 acpid
[Wed aug 22 01:11:31 2018] [ 875] 103 875 10777 266 25 3 0 -900 dbus-daemon
[Wed aug 22 01:11:31 2018] [ 889] 0 889 6511 51 18 3 0 0 atd
[Wed aug 22 01:11:31 2018] [ 897] 101 897 65157 448 30 3 0 0 rsyslogd
[Wed aug 22 01:11:31 2018] [ 898] 0 898 7558 71 20 3 0 0 cron
[Wed aug 22 01:11:31 2018] [ 901] 0 901 7554 507 19 3 0 0 systemd-logind
[Wed aug 22 01:11:31 2018] [ 914] 0 914 69788 682 37 3 0 0 accounts-daemon
[Wed aug 22 01:11:31 2018] [ 943] 0 943 7469 52 20 3 0 0 cgmanager
[Wed aug 22 01:11:31 2018] [ 1094] 0 1094 4898 85 13 3 0 0 irqbalance
[Wed aug 22 01:11:31 2018] [ 1210] 0 1210 69277 202 40 4 0 0 polkitd
[Wed aug 22 01:11:31 2018] [ 1292] 0 1292 2631 29 9 3 0 0 hp-asrd
[Wed aug 22 01:11:31 2018] [ 1294] 0 1294 2665 35 10 3 0 0 hp-asrd
[Wed aug 22 01:11:31 2018] [ 1331] 0 1331 16377 180 35 3 0 -1000 sshd
[Wed aug 22 01:11:31 2018] [ 1389] 0 1389 4290 32 13 3 0 0 agetty
[Wed aug 22 01:11:31 2018] [ 1411] 0 1411 31618 1017 65 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1415] 0 1415 8167 50 20 3 0 0 rotatelogs
[Wed aug 22 01:11:31 2018] [ 1416] 0 1416 8167 55 21 3 0 0 rotatelogs
[Wed aug 22 01:11:31 2018] [ 1417] 111 1417 31651 1009 57 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1418] 111 1418 31656 1035 58 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1419] 111 1419 31651 1009 57 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1420] 111 1420 31651 1009 57 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1421] 111 1421 31651 1035 58 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 2211] 108 2211 12495 96 27 3 0 0 dnsmasq
[Wed aug 22 01:11:31 2018] [ 2212] 0 2212 12488 93 27 3 0 0 dnsmasq
[Wed aug 22 01:11:31 2018] [16086] 1004 16086 5161 1486 14 5 0 0 gnatsd
[Wed aug 22 01:11:31 2018] [26914] 111 26914 31656 1006 57 3 0 0 hpsmhd
[Wed aug 22 01:11:31 2018] [26036] 0 26036 11165 148 21 3 0 -1000 systemd-udevd
[Wed aug 22 01:11:31 2018] [29056] 100 29056 7345 48 19 3 0 0 uuidd
[Wed aug 22 01:11:31 2018] [29767] 0 29767 30229 203 45 3 0 -900 virtlogd
[Wed aug 22 01:11:31 2018] [29830] 0 29830 30230 204 43 3 0 -900 virtlockd
[Wed aug 22 01:11:31 2018] [11118] 1002 11118 11319 214 26 3 0 0 systemd
[Wed aug 22 01:11:31 2018] [11119] 1002 11119 15857 495 33 3 0 0 (sd-pam)
[Wed aug 22 01:11:31 2018] [26385] 999 26385 11497 282 19 3 0 0 redis-server
[Wed aug 22 01:11:31 2018] [ 751] 110 751 27508 167 25 3 0 0 ntpd
[Wed aug 22 01:11:31 2018] [ 8160] 0 8160 202275 2186 161 3 0 0 libvirtd
[Wed aug 22 01:11:31 2018] [13344] 105 13344 78205 3099 74 3 0 -900 postgres
[Wed aug 22 01:11:31 2018] [13417] 105 13417 1043259 675565 1390 7 0 -900 postgres
[Wed aug 22 01:11:31 2018] [13747] 105 13747 78230 889 68 3 0 0 postgres
[Wed aug 22 01:11:31 2018] [13749] 105 13749 78205 828 68 3 0 0 postgres
[Wed aug 22 01:11:31 2018] [13752] 105 13752 78205 1550 66 3 0 0 postgres
[Wed aug 22 01:11:31 2018] [13756] 105 13756 78308 754 68 3 0 0 postgres
[Wed aug 22 01:11:31 2018] [13757] 105 13757 41938 496 62 3 0 0 postgres
[Wed aug 22 01:11:31 2018] [13758] 105 13758 78280 660 67 3 0 0 postgres
[Wed aug 22 01:11:31 2018] [17793] 105 17793 1043259 541 63 5 0 0 postgres
[Wed aug 22 01:11:31 2018] [17794] 105 17794 1043259 2549 68 6 0 0 postgres
[Wed aug 22 01:11:31 2018] [17796] 105 17796 1043259 505 62 5 0 0 postgres
[Wed aug 22 01:11:31 2018] [17800] 105 17800 1043388 736 70 6 0 0 postgres
[Wed aug 22 01:11:31 2018] [17801] 105 17801 40652 490 60 4 0 0 postgres
[Wed aug 22 01:11:31 2018] [ 6550] 0 6550 3764 49 13 3 0 0 xinetd
[Wed aug 22 01:11:31 2018] [ 6755] 0 6755 16352 117 24 3 0 0 master
[Wed aug 22 01:11:31 2018] [ 6757] 119 6757 16910 206 25 3 0 0 qmgr
[Wed aug 22 01:11:31 2018] [28294] 113 28294 23238 214 47 3 0 0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28305] 113 28305 23238 457 47 3 0 0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28306] 113 28306 23238 299 47 3 0 0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28307] 113 28307 23238 235 47 3 0 0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28308] 113 28308 23238 240 47 3 0 0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28309] 113 28309 23238 221 47 3 0 0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [17160] 1001 17160 6631678 3848634 7889 27 0 0 java
[Wed aug 22 01:11:31 2018] [19768] 0 19768 77523 39704 141 4 0 0 apache2
[Wed aug 22 01:11:31 2018] [21477] 33 21477 461614 2304 114 6 0 0 apache2
[Wed aug 22 01:11:31 2018] [ 2617] 33 2617 346974 2057 107 5 0 0 apache2
[Wed aug 22 01:11:31 2018] [ 4691] 1001 4691 4086425 1370850 2998 19 0 0 java
[Wed aug 22 01:11:31 2018] [ 5911] 0 5911 24241 716 52 3 0 0 sshd
[Wed aug 22 01:11:31 2018] [ 5914] 1000 5914 11319 547 27 3 0 0 systemd
[Wed aug 22 01:11:31 2018] [ 5915] 1000 5915 15981 647 33 3 0 0 (sd-pam)
[Wed aug 22 01:11:31 2018] [ 5974] 1000 5974 24241 596 51 3 0 0 sshd
[Wed aug 22 01:11:31 2018] [ 5975] 1000 5975 5973 940 17 3 0 0 bash
[Wed aug 22 01:11:31 2018] [ 5992] 0 5992 15653 582 36 3 0 0 sudo
[Wed aug 22 01:11:31 2018] [ 6041] 0 6041 5947 895 17 3 0 0 bash
[Wed aug 22 01:11:31 2018] [ 6104] 0 6104 15918 935 36 3 0 0 mc
[Wed aug 22 01:11:31 2018] [ 6106] 0 6106 5947 910 15 3 0 0 bash
[Wed aug 22 01:11:31 2018] [14474] 33 14474 281509 2009 102 5 0 0 apache2
[Wed aug 22 01:11:31 2018] [ 9477] 0 9477 4274 1204 13 3 0 0 atop
[Wed aug 22 01:11:31 2018] [12755] 119 12755 16869 162 26 3 0 0 pickup
[Wed aug 22 01:11:31 2018] [15069] 113 15069 1126 145 7 3 0 0 sh
[Wed aug 22 01:11:31 2018] [15070] 113 15070 5550 458 16 3 0 0 check.sh
[Wed aug 22 01:11:31 2018] [15073] 113 15073 3862 391 12 3 0 0 grep
[Wed aug 22 01:11:31 2018] [15074] 113 15074 3868 89 13 3 0 0 grep
[Wed aug 22 01:11:31 2018] [15075] 113 15075 1126 143 7 3 0 0 sh
[Wed aug 22 01:11:31 2018] [15076] 113 15076 5550 444 16 3 0 0 check1.sh
[Wed aug 22 01:11:31 2018] [15079] 113 15079 3862 73 12 3 0 0 grep
[Wed aug 22 01:11:31 2018] [15080] 113 15080 3868 72 12 3 0 0 grep
[Wed aug 22 01:11:31 2018] [15081] 113 15081 1126 149 8 3 0 0 sh
[Wed aug 22 01:11:31 2018] [15082] 113 15082 5550 456 17 3 0 0 check4.sh
[Wed aug 22 01:11:31 2018] [15085] 113 15085 3862 71 14 3 0 0 grep
[Wed aug 22 01:11:31 2018] [15086] 113 15086 3868 51 13 3 0 0 grep
[Wed aug 22 01:11:31 2018] Out of memory: Kill process 17160 (java) score 625 or sacrifice child
[Wed aug 22 01:11:31 2018] Killed process 17160 (java) total-vm:26526712kB, anon-rss:15394536kB, file-rss:0kB, shmem-rss:0kB
I've read some answers about reading OOM killer logs, but my numbers doesn't seem to add up. First, some answers are about running out of memory pages registry space on 32bit systems. This is not the case, we have 64bit system.
What I don't quite understand is:
- Measuring units. RSS column should be in
pages
. But if I sum it, I get13705822. Multiplied by 4096 it would be 56139046912 (52Gb), but we have only 24Gb of RAM and no swap. If it is in9963350. Multiplied by 4096 it would be 40809881600 (38Gb), but we have only 24Gb of RAM and no swap. If it is inkB
instead, it would be 14034761728 (13Gb)kB
instead, it would be 10202470400 (9,5Gb)and a lot of memory was free? (sorry, I was summing wrong column, but problem is the same) - The log says there was
785008 total pagecache pages
, but for some reason the system didn't evict them? It couldn't? Or didn't even try? this is probably related to known problem of postgres shared buffers being shown as page cache - Adding [presumably] 13Gb of RSS in kB + 3Gb of pagecache gives 16Gb, which is still much less than total of 24Gb.
Node 0 Normal free:32604kB
andNode 1 Normal free:46956kB
means there was a bit of memory free. A lot more than 1 page, but just 0,3% of total which might be below some threshold? Or did application ask for a bigger region at once?- process 4691 which is second-most memory consumer is not in the system anymore, but another process with big uptime is, but under another pid. Somehow it is not seen in the log, despite its uptime is greater than the difference from oom killer log to now. Could this process just change the pid at runtime?
Other info: OS Ubuntu 16.04, 24Gb RAM, no swap.