0

First, the log:

[Wed aug 22 01:11:31 2018] java invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=0
[Wed aug 22 01:11:31 2018] java cpuset=/ mems_allowed=0-1
[Wed aug 22 01:11:31 2018] CPU: 5 PID: 4726 Comm: java Tainted: G          I     4.13.0-32-generic #35~16.04.1-Ubuntu
[Wed aug 22 01:11:31 2018] Hardware name: HP ProLiant DL380 G7, BIOS P67 05/14/2010
[Wed aug 22 01:11:31 2018] Call Trace:
[Wed aug 22 01:11:31 2018]  dump_stack+0x63/0x8b
[Wed aug 22 01:11:31 2018]  dump_header+0x97/0x225
[Wed aug 22 01:11:31 2018]  ? security_capable_noaudit+0x45/0x60
[Wed aug 22 01:11:31 2018]  oom_kill_process+0x219/0x420
[Wed aug 22 01:11:31 2018]  out_of_memory+0x11d/0x4b0
[Wed aug 22 01:11:31 2018]  __alloc_pages_slowpath+0xd2e/0xe10
[Wed aug 22 01:11:31 2018]  ? apic_timer_interrupt+0x101/0x1c0
[Wed aug 22 01:11:31 2018]  ? apic_timer_interrupt+0xd7/0x1c0
[Wed aug 22 01:11:31 2018]  __alloc_pages_nodemask+0x260/0x280
[Wed aug 22 01:11:31 2018]  alloc_pages_vma+0x88/0x1e0
[Wed aug 22 01:11:31 2018]  __handle_mm_fault+0xc7b/0x1070
[Wed aug 22 01:11:31 2018]  handle_mm_fault+0xcc/0x1c0
[Wed aug 22 01:11:31 2018]  __do_page_fault+0x258/0x4f0
[Wed aug 22 01:11:31 2018]  do_page_fault+0x22/0x30
[Wed aug 22 01:11:31 2018]  ? page_fault+0x36/0x60
[Wed aug 22 01:11:31 2018]  page_fault+0x4c/0x60
[Wed aug 22 01:11:31 2018] RIP: 0033:0x7f2bc96dd62e
[Wed aug 22 01:11:31 2018] RSP: 002b:00007f2b4f8f7450 EFLAGS: 00010246
[Wed aug 22 01:11:31 2018] RAX: 0000000000000000 RBX: 00000006d7fe91d0 RCX: 00000000003e81aa
[Wed aug 22 01:11:31 2018] RDX: 0000000005bbe6da RSI: 00000006d7fe91e0 RDI: 00000006ecfa2000
[Wed aug 22 01:11:31 2018] RBP: 00000000daffd23a R08: 000000000b77cdc0 R09: 0000000005bbe6da
[Wed aug 22 01:11:31 2018] R10: 00000000016ef9b8 R11: 000000074e1015d8 R12: 0000000000000000
[Wed aug 22 01:11:31 2018] R13: 0000000005bbe6da R14: 0000000647817e70 R15: 00007f2bd8162800
[Wed aug 22 01:11:31 2018] Mem-Info:
[Wed aug 22 01:11:31 2018] active_anon:5272319 inactive_anon:753357 isolated_anon:0
                                active_file:201 inactive_file:275 isolated_file:0
                                unevictable:1202 dirty:17 writeback:0 unstable:0
                                slab_reclaimable:17005 slab_unreclaimable:39032
                                mapped:720524 shmem:783543 pagetables:15557 bounce:0
                                free:35825 free_pcp:11 free_cma:0
[Wed aug 22 01:11:31 2018] Node 0 active_anon:11964008kB inactive_anon:90592kB active_file:32kB inactive_file:28kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:40904kB dirty:0kB writeback:0kB shmem:120956kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Wed aug 22 01:11:31 2018] Node 1 active_anon:9125268kB inactive_anon:2922836kB active_file:772kB inactive_file:1072kB unevictable:4808kB isolated(anon):0kB isolated(file):0kB mapped:2841192kB dirty:68kB writeback:0kB shmem:3013216kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Wed aug 22 01:11:31 2018] Node 0 DMA free:15892kB min:56kB low:68kB high:80kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed aug 22 01:11:31 2018] lowmem_reserve[]: 0 3494 12003 12003 12003
[Wed aug 22 01:11:31 2018] Node 0 DMA32 free:47848kB min:12928kB low:16460kB high:19992kB active_anon:3483584kB inactive_anon:12368kB active_file:16kB inactive_file:0kB unevictable:0kB writepending:0kB present:3643584kB managed:3578016kB mlocked:0kB kernel_stack:496kB pagetables:3816kB bounce:0kB free_pcp:4kB local_pcp:4kB free_cma:0kB
[Wed aug 22 01:11:31 2018] lowmem_reserve[]: 0 0 8509 8509 8509
[Wed aug 22 01:11:31 2018] Node 0 Normal free:32604kB min:31892kB low:40604kB high:49316kB active_anon:8480424kB inactive_anon:78224kB active_file:16kB inactive_file:28kB unevictable:0kB writepending:0kB present:8912896kB managed:8713220kB mlocked:0kB kernel_stack:4168kB pagetables:12244kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed aug 22 01:11:31 2018] lowmem_reserve[]: 0 0 0 0 0
[Wed aug 22 01:11:31 2018] Node 1 Normal free:46956kB min:45228kB low:57584kB high:69940kB active_anon:9124728kB inactive_anon:2922836kB active_file:772kB inactive_file:1072kB unevictable:4808kB writepending:68kB present:12582908kB managed:12361464kB mlocked:4808kB kernel_stack:8456kB pagetables:46168kB bounce:0kB free_pcp:112kB local_pcp:0kB free_cma:0kB
[Wed aug 22 01:11:31 2018] lowmem_reserve[]: 0 0 0 0 0
[Wed aug 22 01:11:31 2018] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
[Wed aug 22 01:11:31 2018] Node 0 DMA32: 515*4kB (UMEH) 240*8kB (UMEH) 177*16kB (UMEH) 205*32kB (UMEH) 115*64kB (UMEH) 72*128kB (UEH) 34*256kB (UE) 16*512kB (UME) 1*1024kB (U) 0*2048kB 0*4096kB = 47868kB
[Wed aug 22 01:11:31 2018] Node 0 Normal: 1502*4kB (UMEH) 1712*8kB (UMEH) 359*16kB (UMEH) 122*32kB (UMEH) 36*64kB (UMEH) 7*128kB (M) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 32808kB
[Wed aug 22 01:11:31 2018] Node 1 Normal: 649*4kB (UME) 654*8kB (UMEH) 1047*16kB (UMEH) 473*32kB (UMEH) 86*64kB (UMH) 2*128kB (H) 2*256kB (H) 1*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 46500kB
[Wed aug 22 01:11:31 2018] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Wed aug 22 01:11:31 2018] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Wed aug 22 01:11:31 2018] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Wed aug 22 01:11:31 2018] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Wed aug 22 01:11:31 2018] 785008 total pagecache pages
[Wed aug 22 01:11:31 2018] 0 pages in swap cache
[Wed aug 22 01:11:31 2018] Swap cache stats: add 0, delete 0, find 0/0
[Wed aug 22 01:11:31 2018] Free swap  = 0kB
[Wed aug 22 01:11:31 2018] Total swap = 0kB
[Wed aug 22 01:11:31 2018] 6288845 pages RAM
[Wed aug 22 01:11:31 2018] 0 pages HighMem/MovableOnly
[Wed aug 22 01:11:31 2018] 121693 pages reserved
[Wed aug 22 01:11:31 2018] 0 pages cma reserved
[Wed aug 22 01:11:31 2018] 0 pages hwpoisoned
[Wed aug 22 01:11:31 2018] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[Wed aug 22 01:11:31 2018] [  407]     0   407    10891     3826      26       3        0             0 systemd-journal
[Wed aug 22 01:11:31 2018] [  421]     0   421    25742       66      18       3        0             0 lvmetad
[Wed aug 22 01:11:31 2018] [  872]     0   872     1099       20       8       3        0             0 acpid
[Wed aug 22 01:11:31 2018] [  875]   103   875    10777      266      25       3        0          -900 dbus-daemon
[Wed aug 22 01:11:31 2018] [  889]     0   889     6511       51      18       3        0             0 atd
[Wed aug 22 01:11:31 2018] [  897]   101   897    65157      448      30       3        0             0 rsyslogd
[Wed aug 22 01:11:31 2018] [  898]     0   898     7558       71      20       3        0             0 cron
[Wed aug 22 01:11:31 2018] [  901]     0   901     7554      507      19       3        0             0 systemd-logind
[Wed aug 22 01:11:31 2018] [  914]     0   914    69788      682      37       3        0             0 accounts-daemon
[Wed aug 22 01:11:31 2018] [  943]     0   943     7469       52      20       3        0             0 cgmanager
[Wed aug 22 01:11:31 2018] [ 1094]     0  1094     4898       85      13       3        0             0 irqbalance
[Wed aug 22 01:11:31 2018] [ 1210]     0  1210    69277      202      40       4        0             0 polkitd
[Wed aug 22 01:11:31 2018] [ 1292]     0  1292     2631       29       9       3        0             0 hp-asrd
[Wed aug 22 01:11:31 2018] [ 1294]     0  1294     2665       35      10       3        0             0 hp-asrd
[Wed aug 22 01:11:31 2018] [ 1331]     0  1331    16377      180      35       3        0         -1000 sshd
[Wed aug 22 01:11:31 2018] [ 1389]     0  1389     4290       32      13       3        0             0 agetty
[Wed aug 22 01:11:31 2018] [ 1411]     0  1411    31618     1017      65       3        0             0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1415]     0  1415     8167       50      20       3        0             0 rotatelogs
[Wed aug 22 01:11:31 2018] [ 1416]     0  1416     8167       55      21       3        0             0 rotatelogs
[Wed aug 22 01:11:31 2018] [ 1417]   111  1417    31651     1009      57       3        0             0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1418]   111  1418    31656     1035      58       3        0             0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1419]   111  1419    31651     1009      57       3        0             0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1420]   111  1420    31651     1009      57       3        0             0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 1421]   111  1421    31651     1035      58       3        0             0 hpsmhd
[Wed aug 22 01:11:31 2018] [ 2211]   108  2211    12495       96      27       3        0             0 dnsmasq
[Wed aug 22 01:11:31 2018] [ 2212]     0  2212    12488       93      27       3        0             0 dnsmasq
[Wed aug 22 01:11:31 2018] [16086]  1004 16086     5161     1486      14       5        0             0 gnatsd
[Wed aug 22 01:11:31 2018] [26914]   111 26914    31656     1006      57       3        0             0 hpsmhd
[Wed aug 22 01:11:31 2018] [26036]     0 26036    11165      148      21       3        0         -1000 systemd-udevd
[Wed aug 22 01:11:31 2018] [29056]   100 29056     7345       48      19       3        0             0 uuidd
[Wed aug 22 01:11:31 2018] [29767]     0 29767    30229      203      45       3        0          -900 virtlogd
[Wed aug 22 01:11:31 2018] [29830]     0 29830    30230      204      43       3        0          -900 virtlockd
[Wed aug 22 01:11:31 2018] [11118]  1002 11118    11319      214      26       3        0             0 systemd
[Wed aug 22 01:11:31 2018] [11119]  1002 11119    15857      495      33       3        0             0 (sd-pam)
[Wed aug 22 01:11:31 2018] [26385]   999 26385    11497      282      19       3        0             0 redis-server
[Wed aug 22 01:11:31 2018] [  751]   110   751    27508      167      25       3        0             0 ntpd
[Wed aug 22 01:11:31 2018] [ 8160]     0  8160   202275     2186     161       3        0             0 libvirtd
[Wed aug 22 01:11:31 2018] [13344]   105 13344    78205     3099      74       3        0          -900 postgres
[Wed aug 22 01:11:31 2018] [13417]   105 13417  1043259   675565    1390       7        0          -900 postgres
[Wed aug 22 01:11:31 2018] [13747]   105 13747    78230      889      68       3        0             0 postgres
[Wed aug 22 01:11:31 2018] [13749]   105 13749    78205      828      68       3        0             0 postgres
[Wed aug 22 01:11:31 2018] [13752]   105 13752    78205     1550      66       3        0             0 postgres
[Wed aug 22 01:11:31 2018] [13756]   105 13756    78308      754      68       3        0             0 postgres
[Wed aug 22 01:11:31 2018] [13757]   105 13757    41938      496      62       3        0             0 postgres
[Wed aug 22 01:11:31 2018] [13758]   105 13758    78280      660      67       3        0             0 postgres
[Wed aug 22 01:11:31 2018] [17793]   105 17793  1043259      541      63       5        0             0 postgres
[Wed aug 22 01:11:31 2018] [17794]   105 17794  1043259     2549      68       6        0             0 postgres
[Wed aug 22 01:11:31 2018] [17796]   105 17796  1043259      505      62       5        0             0 postgres
[Wed aug 22 01:11:31 2018] [17800]   105 17800  1043388      736      70       6        0             0 postgres
[Wed aug 22 01:11:31 2018] [17801]   105 17801    40652      490      60       4        0             0 postgres
[Wed aug 22 01:11:31 2018] [ 6550]     0  6550     3764       49      13       3        0             0 xinetd
[Wed aug 22 01:11:31 2018] [ 6755]     0  6755    16352      117      24       3        0             0 master
[Wed aug 22 01:11:31 2018] [ 6757]   119  6757    16910      206      25       3        0             0 qmgr
[Wed aug 22 01:11:31 2018] [28294]   113 28294    23238      214      47       3        0             0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28305]   113 28305    23238      457      47       3        0             0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28306]   113 28306    23238      299      47       3        0             0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28307]   113 28307    23238      235      47       3        0             0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28308]   113 28308    23238      240      47       3        0             0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [28309]   113 28309    23238      221      47       3        0             0 zabbix_agentd
[Wed aug 22 01:11:31 2018] [17160]  1001 17160  6631678  3848634    7889      27        0             0 java
[Wed aug 22 01:11:31 2018] [19768]     0 19768    77523    39704     141       4        0             0 apache2
[Wed aug 22 01:11:31 2018] [21477]    33 21477   461614     2304     114       6        0             0 apache2
[Wed aug 22 01:11:31 2018] [ 2617]    33  2617   346974     2057     107       5        0             0 apache2
[Wed aug 22 01:11:31 2018] [ 4691]  1001  4691  4086425  1370850    2998      19        0             0 java
[Wed aug 22 01:11:31 2018] [ 5911]     0  5911    24241      716      52       3        0             0 sshd
[Wed aug 22 01:11:31 2018] [ 5914]  1000  5914    11319      547      27       3        0             0 systemd
[Wed aug 22 01:11:31 2018] [ 5915]  1000  5915    15981      647      33       3        0             0 (sd-pam)
[Wed aug 22 01:11:31 2018] [ 5974]  1000  5974    24241      596      51       3        0             0 sshd
[Wed aug 22 01:11:31 2018] [ 5975]  1000  5975     5973      940      17       3        0             0 bash
[Wed aug 22 01:11:31 2018] [ 5992]     0  5992    15653      582      36       3        0             0 sudo
[Wed aug 22 01:11:31 2018] [ 6041]     0  6041     5947      895      17       3        0             0 bash
[Wed aug 22 01:11:31 2018] [ 6104]     0  6104    15918      935      36       3        0             0 mc
[Wed aug 22 01:11:31 2018] [ 6106]     0  6106     5947      910      15       3        0             0 bash
[Wed aug 22 01:11:31 2018] [14474]    33 14474   281509     2009     102       5        0             0 apache2
[Wed aug 22 01:11:31 2018] [ 9477]     0  9477     4274     1204      13       3        0             0 atop
[Wed aug 22 01:11:31 2018] [12755]   119 12755    16869      162      26       3        0             0 pickup
[Wed aug 22 01:11:31 2018] [15069]   113 15069     1126      145       7       3        0             0 sh
[Wed aug 22 01:11:31 2018] [15070]   113 15070     5550      458      16       3        0             0 check.sh
[Wed aug 22 01:11:31 2018] [15073]   113 15073     3862      391      12       3        0             0 grep
[Wed aug 22 01:11:31 2018] [15074]   113 15074     3868       89      13       3        0             0 grep
[Wed aug 22 01:11:31 2018] [15075]   113 15075     1126      143       7       3        0             0 sh
[Wed aug 22 01:11:31 2018] [15076]   113 15076     5550      444      16       3        0             0 check1.sh
[Wed aug 22 01:11:31 2018] [15079]   113 15079     3862       73      12       3        0             0 grep
[Wed aug 22 01:11:31 2018] [15080]   113 15080     3868       72      12       3        0             0 grep
[Wed aug 22 01:11:31 2018] [15081]   113 15081     1126      149       8       3        0             0 sh
[Wed aug 22 01:11:31 2018] [15082]   113 15082     5550      456      17       3        0             0 check4.sh
[Wed aug 22 01:11:31 2018] [15085]   113 15085     3862       71      14       3        0             0 grep
[Wed aug 22 01:11:31 2018] [15086]   113 15086     3868       51      13       3        0             0 grep
[Wed aug 22 01:11:31 2018] Out of memory: Kill process 17160 (java) score 625 or sacrifice child
[Wed aug 22 01:11:31 2018] Killed process 17160 (java) total-vm:26526712kB, anon-rss:15394536kB, file-rss:0kB, shmem-rss:0kB

I've read some answers about reading OOM killer logs, but my numbers doesn't seem to add up. First, some answers are about running out of memory pages registry space on 32bit systems. This is not the case, we have 64bit system.

What I don't quite understand is:

  • Measuring units. RSS column should be in pages. But if I sum it, I get 13705822. Multiplied by 4096 it would be 56139046912 (52Gb), but we have only 24Gb of RAM and no swap. If it is in kB instead, it would be 14034761728 (13Gb) 9963350. Multiplied by 4096 it would be 40809881600 (38Gb), but we have only 24Gb of RAM and no swap. If it is in kB instead, it would be 10202470400 (9,5Gb)and a lot of memory was free? (sorry, I was summing wrong column, but problem is the same)
  • The log says there was 785008 total pagecache pages, but for some reason the system didn't evict them? It couldn't? Or didn't even try? this is probably related to known problem of postgres shared buffers being shown as page cache
  • Adding [presumably] 13Gb of RSS in kB + 3Gb of pagecache gives 16Gb, which is still much less than total of 24Gb.
  • Node 0 Normal free:32604kB and Node 1 Normal free:46956kB means there was a bit of memory free. A lot more than 1 page, but just 0,3% of total which might be below some threshold? Or did application ask for a bigger region at once?
  • process 4691 which is second-most memory consumer is not in the system anymore, but another process with big uptime is, but under another pid. Somehow it is not seen in the log, despite its uptime is greater than the difference from oom killer log to now. Could this process just change the pid at runtime?

Other info: OS Ubuntu 16.04, 24Gb RAM, no swap.

Imaskar
  • 123
  • 5
  • Shared memory can count towards the RSS of more than one process. – kasperd Aug 23 '18 at 07:28
  • @kasperd how to get accurate data then? – Imaskar Aug 23 '18 at 07:38
  • @lmaskar 6288845 pages RAM = echo $((6288845*4096/1024/1024/1024)) 23 – c4f4t0r Aug 23 '18 at 07:41
  • @c4f4t0r but it shows 9963350 pages. The problem is - server seems to be in good state but suddenly goes OOM. I look at `free -m` and see an okay margin, but in reality it is on the edge. – Imaskar Aug 23 '18 at 08:07
  • @lmaskar I don't see the number 9963350 in your answer output – c4f4t0r Aug 23 '18 at 08:56
  • @c4f4t0r it is a sum of RSS column. I get it, that some pages are shared, but what is the way to find real memory hog? Java looks to take more pages, but what amount of them are shared? If we don't know that, we can't see 1) who to blame 2) when is memory usage becomes critical. – Imaskar Aug 23 '18 at 10:41

0 Answers0