2

Using journalctl -u docker I noticed

May 30 10:01:43 xxx systemd[1]: Stopping Docker Application Container Engine...
...
docker specific error log in between
...
May 30 10:01:51 xxx systemd[1]: Stopped Docker Application Container Engine...

I saw /var/log/auth.log and there were no attempt to any docker entry for the whole week.

Saw no termination attempts in root history, as well as the our shared user

systemd entry:

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity

# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

# kill only the docker process, not all processes in the cgroup
KillMode=process

[Install]
WantedBy=multi-user.target

I don't even know why it won't restart. Looks like someone terminated service manually.

In my understanding. Systemd should at least try to restart service if it was stopped by the issue. Which makes me think that was on the someone's demand.

How to figure this out?

Docker version 19.03.8, build afacb8b7f0. Uptime 28 days.

Recently was a memory leak problem, eating almost everything. But I saw nothing in logs about the memory.

UPD OOM killer in /var/log/kern.log (thanks @Abhijith):

May 30 10:01:42 compute03 kernel: [2263822.755824] [ pid ]   uid  tgid 
total_vm      rss pgtables_bytes swapents oom_score_adj name
May 30 10:01:42 compute03 kernel: [2263822.755829] [  404]     0   404    71910        1   540672     3377             0 systemd-journal
May 30 10:01:42 compute03 kernel: [2263822.755830] [  414]     0   414    10905        0   122880      372         -1000 systemd-udevd
May 30 10:01:42 compute03 kernel: [2263822.755831] [  417]     0   417    24427        0    94208       55             0 lvmetad
May 30 10:01:42 compute03 kernel: [2263822.755833] [  606] 62583   606    35484        0   184320      187             0 systemd-timesyn
May 30 10:01:42 compute03 kernel: [2263822.755834] [  655]   100   655    18265        0   167936      385             0 systemd-network
May 30 10:01:42 compute03 kernel: [2263822.755835] [  678]   101   678    17693        0   184320      200             0 systemd-resolve
May 30 10:01:42 compute03 kernel: [2263822.755836] [  890]     0   890    27604       20   118784       64             0 irqbalance
May 30 10:01:42 compute03 kernel: [2263822.755837] [  898]     0   898    17670        0   184320      218             0 systemd-logind
May 30 10:01:42 compute03 kernel: [2263822.755838] [  899]     0   899   169538        0   147456      219             0 lxcfs
May 30 10:01:42 compute03 kernel: [2263822.755839] [  901]   103   901    12544        0   143360      199          -900 dbus-daemon
May 30 10:01:42 compute03 kernel: [2263822.755840] [  905]     0   905     7507        0   102400       72             0 cron
May 30 10:01:42 compute03 kernel: [2263822.755841] [  907]     0   907     7083        0   106496       58             0 atd
May 30 10:01:42 compute03 kernel: [2263822.755842] [  908]     0   908    71588        0   192512      260             0 accounts-daemon
May 30 10:01:42 compute03 kernel: [2263822.755843] [  909]   102   909    65758        0   172032      461             0 rsyslogd
May 30 10:01:42 compute03 kernel: [2263822.755844] [  916]     0   916    42372        0   233472     2022             0 networkd-dispat
May 30 10:01:42 compute03 kernel: [2263822.755845] [  921]     0   921   301259        0   348160     6201             0 containerd
May 30 10:01:42 compute03 kernel: [2263822.755846] [  923]   112   923    26804        0   233472      291             0 zabbix_agentd
May 30 10:01:42 compute03 kernel: [2263822.755847] [  929]     0   929    46488        0   262144     2000             0 unattended-upgr
May 30 10:01:42 compute03 kernel: [2263822.755848] [  931]     0   931   300744      120   495616    12158          -500 dockerd
May 30 10:01:42 compute03 kernel: [2263822.755849] [  944]   112   944    28924        1   262144      307             0 zabbix_agentd
May 30 10:01:42 compute03 kernel: [2263822.755850] [  945]   112   945    29478       11   270336      357             0 zabbix_agentd
May 30 10:01:42 compute03 kernel: [2263822.755852] [  946]   112   946    29478        0   270336      369             0 zabbix_agentd
May 30 10:01:42 compute03 kernel: [2263822.755853] [  947]   112   947    29478       12   270336      355             0 zabbix_agentd
May 30 10:01:42 compute03 kernel: [2263822.755854] [  952]     0   952     3666        0    73728       38             0 agetty
May 30 10:01:42 compute03 kernel: [2263822.755856] [  954]   112   954    27903        0   258048      360             0 zabbix_agentd
May 30 10:01:42 compute03 kernel: [2263822.755857] [  958]     0   958     3722        0    77824       36             0 agetty
May 30 10:01:42 compute03 kernel: [2263822.755858] [  960]     0   960    18075        1   188416      191         -1000 sshd
May 30 10:01:42 compute03 kernel: [2263822.755859] [  961]     0   961    72221        0   212992      274             0 polkitd
May 30 10:01:42 compute03 kernel: [2263822.755860] [ 6213]  1000  6213    19225        0   196608      346             0 systemd
May 30 10:01:42 compute03 kernel: [2263822.755861] [ 6214]  1000  6214    27956        0   245760      614             0 (sd-pam)
May 30 10:01:42 compute03 kernel: [2263822.755862] [ 6307]  1000  6307    63356      313   385024    12640             0 service
May 30 10:01:42 compute03 kernel: [2263822.755863] [ 3600]     0  3600    26925        0    65536      265          -999 containerd-shim
May 30 10:01:42 compute03 kernel: [2263822.755864] [ 3628]   999  3628   818153   332342  6262784   394513             0 python
May 30 10:01:42 compute03 kernel: [2263822.755865] [ 3703]     0  3703    26925        0    73728      271          -999 containerd-shim
May 30 10:01:42 compute03 kernel: [2263822.755875] [ 3732]   999  3732   818151   288134  6258688   438719             0 python
May 30 10:01:42 compute03 kernel: [2263822.755876] [ 4172]     0  4172    26925        0    73728      271          -999 containerd-shim
May 30 10:01:42 compute03 kernel: [2263822.755878] [ 4196]   999  4196   324489    77683  2314240   156754             0 python
May 30 10:01:42 compute03 kernel: [2263822.755879] [ 4332]     0  4332    27277        0    77824      318          -999 containerd-shim
May 30 10:01:42 compute03 kernel: [2263822.755880] [ 4362]   999  4362   286331   192099  2007040     4441             0 python
May 30 10:01:42 compute03 kernel: [2263822.755881] [ 4431]     0  4431    26925        0    73728      243          -999 containerd-shim
May 30 10:01:42 compute03 kernel: [2263822.755882] [ 4460]   999  4460   152545    57219   913408     5807             0 python
May 30 10:01:42 compute03 kernel: [2263822.755883] [ 4515]  1000  4515   354203        0   565248    13231             0 service
May 30 10:01:42 compute03 kernel: [2263822.755884] Out of memory: Kill process 3628 (python) score 353 or sacrifice child
May 30 10:01:42 compute03 kernel: [2263822.757606] Killed process 3628 (python) total-vm:3272612kB, anon-rss:1329368kB, file-rss:0kB, shmem-rss:0kB
May 30 10:01:42 compute03 kernel: [2263822.899423] oom_reaper: reaped process 3628 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

As I see docker has -500 oom score, but the next attempt (after 30 minutes) there were no docker presence in a table.

Every previous words grepped by docker word are just an info logs. No errors before that indecent starts.

kAldown
  • 161
  • 1
  • 1
  • 7

1 Answers1

1

Have you checked if the service was killled by a memory issue. Linux out_of_memory automatically kills process if system is out of memory that the RAM or swap has been filled please run the below command

grep docker /var/log/kern.log

If it is not available then look in the /var/log/messages

This is just an assumption

Abhijith
  • 15
  • 8
  • Thank you. Really useful information. I see exact starting *oom killer*. But *docker* has *-500 score* and only python processes being killed. Yet. There are *no* docker process next attempt *oom killer* tries to parse table =(. Added in the main question – kAldown Jun 02 '20 at 13:12