0

One my ICP nodes appears to be running, but the services on that node are unresponsive and will at times return a 504 Gateway Timeout.

When I SSH into the unresponsive node and run journalctl -u kubelet -f I am seeing error messages such as transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused

Furthermore, when I run top I'm seeing dockerd using an usually high percentage of my CPU.

What is causing this behavior and how can I return my node to its normal working condition?

James Young IBM
  • 616
  • 1
  • 5
  • 13

1 Answers1

1

These errors might be due to a known issue with Docker where an old containerd reference is used even after the containerd daemon was restarted. This defect causes the Docker daemon to go into an internal error loop that uses a high amount of CPU resources and logs a high number of errors. For more information about this error, please see the Refresh containerd remotes on containerd restarted pull request against the Moby project.

To work around this issue, use the host operating system command to restart the docker service on the node. After some time, the services should resume.

James Young IBM
  • 616
  • 1
  • 5
  • 13