0

My server has died several times in the middle of the night because of this. How much memory is apt-upgrade asking for? Is it being reasonable? I'm trying to decipher the logs here. Any help or advice appreciated!

Amazon EC2 server with 1GB of memory. Ubuntu 18.04.5 LTS (Bionic Beaver)

Mar 25 06:46:01 ip-172-31-30-204 systemd[1]: Starting Daily apt upgrade and clean activities...
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631907] snapd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-900
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631911] CPU: 1 PID: 31093 Comm: snapd Not tainted 5.3.0-1032-aws #34~18.04.2-Ubuntu
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631912] Hardware name: Amazon EC2 t3.micro/, BIOS 1.0 10/16/2017
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631912] Call Trace:
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631921]  dump_stack+0x6d/0x95
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631925]  dump_header+0x4f/0x200
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631927]  oom_kill_process+0xe6/0x120
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631929]  out_of_memory+0x109/0x510
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631931]  __alloc_pages_slowpath+0xad1/0xe10
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631934]  ? __switch_to_asm+0x34/0x70
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631936]  ? __switch_to_asm+0x40/0x70
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631938]  __alloc_pages_nodemask+0x2cd/0x320
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631942]  alloc_pages_current+0x6a/0xe0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631944]  __page_cache_alloc+0x6a/0xa0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631946]  pagecache_get_page+0x9c/0x2b0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631947]  filemap_fault+0x66d/0xb60
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631950]  ? page_add_file_rmap+0x5e/0x150
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631952]  ? alloc_set_pte+0x113/0x5f0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631956]  ? xas_load+0xc/0x80
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631957]  ? xas_find+0x16f/0x1b0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631959]  ? filemap_map_pages+0x18f/0x380
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631961]  __do_fault+0x57/0x110
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631963]  __handle_mm_fault+0xdd8/0x1260
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631965]  handle_mm_fault+0xcb/0x210
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631969]  __do_page_fault+0x2a1/0x4d0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631971]  do_page_fault+0x2c/0xe0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631974]  do_async_page_fault+0x54/0x70
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631977]  async_page_fault+0x34/0x40
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631979] RIP: 0033:0x55cf079179d8
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631983] Code: Bad RIP value.
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631984] RSP: 002b:000000c420063b60 EFLAGS: 00010283
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631986] RAX: 000055cf08c6f5a0 RBX: 0000000000000000 RCX: 000055cf0869de80
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631987] RDX: 0000000000000018 RSI: 000000c420606068 RDI: 000000c420606078
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631988] RBP: 000000c420063b80 R08: 000000c420063b68 R09: 00000000000000ff
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631989] R10: 0000000000000001 R11: 000000c4204f9440 R12: 0000000000000001
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631990] R13: 0000000000000040 R14: 00000000000000d9 R15: 0000000000000000
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631992] Mem-Info:
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631995] active_anon:199193 inactive_anon:41 isolated_anon:0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631995]  active_file:39 inactive_file:31 isolated_file:0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631995]  unevictable:0 dirty:2 writeback:0 unstable:0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631995]  slab_reclaimable:8066 slab_unreclaimable:12259
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631995]  mapped:13 shmem:190 pagetables:1921 bounce:0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631995]  free:12108 free_pcp:452 free_cma:0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631998] Node 0 active_anon:796772kB inactive_anon:164kB active_file:156kB inactive_file:124kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:52kB dirty:8kB writeback:0kB shmem:760kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.631999] Node 0 DMA free:4340kB min:756kB low:944kB high:1132kB active_anon:11088kB inactive_anon:8kB active_file:64kB inactive_file:8kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:48kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632003] lowmem_reserve[]: 0 909 909 909 909
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632005] Node 0 DMA32 free:44092kB min:44296kB low:55368kB high:66440kB active_anon:785684kB inactive_anon:156kB active_file:52kB inactive_file:628kB unevictable:0kB writepending:8kB present:1003496kB managed:960336kB mlocked:0kB kernel_stack:4208kB pagetables:7636kB bounce:0kB free_pcp:1808kB local_pcp:396kB free_cma:0kB
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632009] lowmem_reserve[]: 0 0 0 0 0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632011] Node 0 DMA: 33*4kB (UME) 25*8kB (UME) 13*16kB (UME) 15*32kB (UME) 13*64kB (UME) 4*128kB (UE) 4*256kB (UME) 2*512kB (UM) 0*1024kB 0*2048kB 0*4096kB = 4412kB
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632019] Node 0 DMA32: 1212*4kB (UME) 689*8kB (UME) 764*16kB (UME) 317*32kB (UME) 133*64kB (UME) 25*128kB (UME) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44440kB
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632027] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632028] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632029] 285 total pagecache pages
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632030] 0 pages in swap cache
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632031] Swap cache stats: add 0, delete 0, find 0/0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632032] Free swap  = 0kB
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632032] Total swap = 0kB
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632033] 254872 pages RAM
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632034] 0 pages HighMem/MovableOnly
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632034] 10811 pages reserved
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632035] 0 pages cma reserved
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632035] 0 pages hwpoisoned
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632036] Tasks state (memory values in pages):
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632036] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632042] [    389]     0   389    32672     2976   282624        0             0 systemd-journal
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632045] [    408]     0   408    11118      584   118784        0         -1000 systemd-udevd
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632047] [    432]     0   432    26475       60    98304        0             0 lvmetad
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632049] [    554] 62583   554    35488      160   184320        0             0 systemd-timesyn
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632051] [    713]   100   713    20051      187   180224        0             0 systemd-network
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632053] [    747]   101   747    17696      183   180224        0             0 systemd-resolve
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632054] [    890]     0   890    17845      393   180224        0             0 systemd-logind
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632056] [    896]   102   896    66816      448   167936        0             0 rsyslogd
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632058] [    897]     0   897    27603       82   118784        0             0 irqbalance
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632060] [    898]   103   898    12580      234   139264        0          -900 dbus-daemon
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632062] [    919]     0   919     1137       15    61440        0             0 acpid
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632064] [    922]     0   922     7082       52   106496        0             0 atd
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632065] [    923]     0   923     7961       75   102400        0             0 cron
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632067] [    924]     0   924   206403      286   163840        0             0 lxcfs
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632069] [    927]     0   927    42771     1978   233472        0             0 networkd-dispat
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632071] [    931]     0   931    18074      190   180224        0         -1000 sshd
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632073] [    937]     0   937    72220      242   212992        0             0 polkitd
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632074] [    942]     0   942    46921     1979   253952        0             0 unattended-upgr
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632076] [    947]     0   947     4102       37    77824        0             0 agetty
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632078] [    950]     0   950     3721       33    65536        0             0 agetty
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632080] [   8489]     0  8489    19170      278   188416        0             0 systemd
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632082] [   8490]     0  8490    47880      618   249856        0             0 (sd-pam)
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632084] [  21005]   106 21005     7148       45    98304        0             0 uuidd
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632086] [  22103]     0 22103    71999      237   192512        0             0 accounts-daemon
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632088] [  12048]     0 12048   217770     2487   229376        0             0 amazon-ssm-agen
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632090] [  12165]     0 12165   221541     4204   262144        0             0 ssm-agent-worke
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632091] [  15998]     0 15998    26998      262   245760        0             0 sshd
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632093] [  30993]     0 30993   215444     4093   262144        0          -900 snapd
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632095] [  32272]     0 32272     8069      198    94208        0             0 screen
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632097] [  32274]     0 32274   621975    10325   376832        0             0 java
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632099] [  32306]     0 32306   726790   135577  1568768        0             0 java
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632101] [   6081]     0  6081    18075      183   184320        0             0 sshd
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632103] [   6082]   109  6082    18075      188   176128        0             0 sshd
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632104] [   6086]     0  6086     1156       17    53248        0             0 apt.systemd.dai
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632106] [   6101]     0  6101     1156       35    61440        0             0 apt.systemd.dai
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632108] [   6134]     0  6134    88545    10780   462848        0             0 unattended-upgr
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632109] [   6153]     0  6153    88545    10796   446464        0             0 unattended-upgr
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632111] [   6171]     0  6171    22038    15667   221184        0             0 dpkg
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632113] [   6179]     0  6179     5346       45    81920        0             0 dpkg-deb
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632114] [   6180]     0  6180     5346       45    77824        0             0 dpkg-deb
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632116] [   6181]     0  6181     7395      979    94208        0             0 dpkg-deb
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632117] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice,task=java,pid=32306,uid=0
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.632288] Out of memory: Killed process 32306 (java) total-vm:2907160kB, anon-rss:542308kB, file-rss:0kB, shmem-rss:0kB
Mar 25 06:46:32 ip-172-31-30-204 kernel: [17858349.681431] oom_reaper: reaped process 32306 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Mar 25 06:46:51 ip-172-31-30-204 systemd[1]: Started Daily apt upgrade and clean activities.
satnam
  • 51
  • 3
  • > Is it being reasonable? Probably. Why are you running Java on 1GB? This seems an architectural miss. – Ackack Mar 25 '21 at 16:53
  • 1
    you can easily run a java webserver that only needs 50MB of memory. 1GB should be more than enough for such a usecase. – satnam Mar 25 '21 at 18:39
  • But your java webserver does not consume only 50 MB of memory. `process 32306 (java)` is the application that by itself consumes the most memory, making it the "best" target for OOM killer to select when your system is running out of available RAM. Either: add (more) restrictions to your java process so there will still be enough memory available when apt starts, or don't run apt as a batch (but run the risk that something else will request memory and the OOM killer still strikes) or upgrade to a larger instance. – Bob Mar 30 '21 at 14:40
  • Could you post the start-up options of java process, like `-Xms` and `-Xmx` ? – Mircea Vutcovici Mar 31 '21 at 14:18

1 Answers1

1

I do not know the Amazon EC2 specifically, but I had similar issues on a DigitalOcean node running Jira (which is written in Java).

One main problem with languages such as Java (Python, Ada, JavaScript, C#...) is that they use what's called garbage collection. So you never have to free an object, it frees itself once all references to it were severed.

One issue can be the garbage collector not running as often as required to keep the amount of memory as low as possible. i.e. if the server gets busy, it will attempt to handle the requests before it attempts to collect memory. This is one of the parameters you can setup when you start a Java program.

The other options I used with memory are the -Xms and such which tells Java how much it is allowed to allocate. When all that memory is allocated, the garbage collector is forced to run so as to rescue as much memory as possible and Java may still stop with an error (i.e. if you have a leak or too many connections at once requiring more memory than allowed), but running APT in parallel should not end up killing your Java app.

Finally, to fix my issue, what I've done, on the top of verifying my parameters, was to add a swap file. That uses some of your disk space, but I think it's worth not having your service just die out all the time. However, it should not be abused... If you swap all the time, it's better to get a bigger computer (double the RAM) because otherwise your system is going to be really slow. But if the auto apt updates take around 15 min., it can be worth it to swap a bit at that point instead of crashing.

Another solution is to restart your service once in a while. That way the memory that was not yet collected is going to forcibly be collected. Of course, any user who tries to connect while you restart your service are going to get an error (500 or 503).

Note: It could be that you get more hits around the time the APT auto-updates happen. If that's the case, you could look at changing the time at which the APT processes run. This is an annoying one, though, since it is defined as a "daily" on cron and all the dailies run one after the other at a given time. You'd have to move all of these processes to another time in the day or change the way the APT updates are setup.

Alexis Wilke
  • 2,210
  • 1
  • 20
  • 37