4

Somehow, after rebooting one of my servers, docker becomes unavailable. The following is the entire content of the boot in progress. As oposed to the boot fo one of my other machines where 4 more log lines are visible:

Jul 22 14:39:59 Ubuntu-1804-bionic-64-minimal dockerd[26234]: time="2019-07-22T14:39:59.791008126+02:00" level=info msg="Docker daemon" commit=0dd43dd graphdriver(s)=o
Jul 22 14:39:59 Ubuntu-1804-bionic-64-minimal dockerd[26234]: time="2019-07-22T14:39:59.791131397+02:00" level=info msg="Daemon has completed initialization"
Jul 22 14:40:00 Ubuntu-1804-bionic-64-minimal dockerd[26234]: time="2019-07-22T14:40:00.944885752+02:00" level=info msg="API listen on /var/run/docker.sock"
Jul 22 14:40:00 Ubuntu-1804-bionic-64-minimal systemd[1]: Started Docker Application Container Engine.

I would really like to know what I can check in order to find out why my docker engine doesn't complete starting up. Please don't give me answers telling me I have to reinstall docker, that is not an option, unless I can contain my existing containers.

Jul 22 18:39:17 srv4 systemd[1]: Starting Docker Application Container Engine...
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.634630237+02:00" level=info msg="systemd-resolved is running, so using resolvconf: /run/systemd/resolve/resolv.conf"
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.675035398+02:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.675056920+02:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.675512905+02:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.675523205+02:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.691598560+02:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0  <nil>}]" module=grpc
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.691639221+02:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.691650622+02:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0  <nil>}]" module=grpc
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.691675127+02:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.691705528+02:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42073f800, CONNECTING" module=grpc
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.691712378+02:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4207d4d80, CONNECTING" module=grpc
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.701635863+02:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42073f800, READY" module=grpc
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.701638953+02:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4207d4d80, READY" module=grpc
Jul 22 18:39:18 srv4 dockerd[1123]: time="2019-07-22T18:39:18.775587750+02:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
Jul 22 18:39:19 srv4 dockerd[1123]: time="2019-07-22T18:39:19.150807807+02:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Jul 22 18:39:19 srv4 dockerd[1123]: time="2019-07-22T18:39:19.151005388+02:00" level=warning msg="Your kernel does not support swap memory limit"
Jul 22 18:39:19 srv4 dockerd[1123]: time="2019-07-22T18:39:19.151039801+02:00" level=warning msg="Your kernel does not support cgroup rt period"
Jul 22 18:39:19 srv4 dockerd[1123]: time="2019-07-22T18:39:19.151046890+02:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Jul 22 18:39:19 srv4 dockerd[1123]: time="2019-07-22T18:39:19.151466840+02:00" level=info msg="Loading containers: start."

Any hints are highly appreciated!

milovanderlinden
  • 1,124
  • 1
  • 12
  • 27
  • 1
    Have you followed the troubleshooting steps for docker daemon. https://docs.docker.com/config/daemon/ If not then please go through this. – mchawre Jul 22 '19 at 17:06

2 Answers2

5

We have just had similiar issues with docker; systemctl status docker showed "loaded, activating" and hanged. All docker commands, like docker ps, docker images hanged (you could only ctrl-c them; they would otherwise hang forever). It did not help to kill -9 any of the processess associated with docker, nor would it help to clone the VM on which the docker was running. System reboot had troubles shutting down the docker service (waited for a minute, then hard shutdowned the VM on which this took place).

In the end, the solution was as follows:

  • we disabled the docker service systemctl disable docker to have some breathing space before it starded, after a reboot,
  • dockerd --debug showed that docker was in some kind of loop restarting one of the containers
  • the folder /var/lib/docker contained all the containers; in our case it was not a problem to remove all of them (including the problematic one), which we did,
  • after this step, systemctl start docker was successful (→"active"), but commands like docker ps did not run, pretending the service was down,
  • we rebooted the machine, getting a clean, properly running docker service finally

I hope this helps (others who for some reason encounter the "activating" docker trouble).

PS> We presume the problem arose somehow while playing with initializing docker swarm on two VM's, with possibly corrupted IP's (same on both VMs); but not sure on that...

P Marecki
  • 1,108
  • 15
  • 19
3

Something might be wrong with your docker daemon, please follow docker daemon troubleshooting steps as mentioned here.

Try to:

  • Run docker daemon manually in foreground mode dockerd, it will print all the daemin logs on your screen.

  • Run docker daemon in debugging mode. dockerd --debug There are other ways too, check this.

  • Force docker daemon to print stack trace. sudo kill -SIGUSR1 $(pidof dockerd) Check this for more info.

These steps will provide you more clear picture of what's going wrong in your system.

Hope this helps.

mchawre
  • 10,744
  • 4
  • 35
  • 57
  • `sudo kill -SIGUSR1 $(pidof dockerd)` gives me no output and the process keeps running – milovanderlinden Jul 22 '19 at 19:15
  • Ok, `dockerd --debug` works. I can see that it loads one container with `isRunning:false`, then tries to load another with `isRunning:true` and it seems to halt there. – milovanderlinden Jul 22 '19 at 19:19
  • Whatever logs or stack trace you're getting, post those logs by opening a github issue for docker or paste those logs in docker slack channel. They will definitely help you more. – mchawre Jul 23 '19 at 04:19