High Docker disk read utilization on startup

Question

My Docker server is very slow to start up. It's in Swarm mode, and each service restart (or just server reboot) it uses 100% of disk utilization, however it takes about an hour to start up. I tried determining what takes so much disk utilization via iotop, and tried stracing the offending processes.

First, when it starts up it reads the disk at utilization rate of 100%. I've tried stracing the offencing processes, which produced something like:

 openat(AT_FDCWD, "/v2/var/lib/docker/image/overlay2/layerdb/sha256/321a302dc92d4265f6a28b570dfb9beb32e3ad076dbe65fa20d77d21a60533b5/descriptor.json", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
 openat(AT_FDCWD, "/v2/var/lib/docker/image/overlay2/layerdb/sha256/eec71f74cdd732f5ce22e6eef945367416b2fc890441926c75bf4faeb0960b87/diff", O_RDONLY|O_CLOEXEC) = 10
 epoll_ctl(5, EPOLL_CTL_ADD, 10, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1811274280, u64=140189543815720}}) = -1 EPERM (Operation not permitted)
 epoll_ctl(5, EPOLL_CTL_DEL, 10, 0xc001c7fd94) = -1 EPERM (Operation not permitted)
 fstat(10, {st_mode=S_IFREG|0644, st_size=71, ...}) = 0
 read(10, "sha256:beba085e841de99407d94ef9a"..., 583) = 71
 futex(0x558e5dc53010, FUTEX_WAKE_PRIVATE, 1) = 1
 futex(0x558e5dc52f10, FUTEX_WAKE_PRIVATE, 1) = 1
 read(10, "", 512)                       = 0
 close(10)                               = 0
 openat(AT_FDCWD, "/v2/var/lib/docker/image/overlay2/layerdb/sha256/eec71f74cdd732f5ce22e6eef945367416b2fc890441926c75bf4faeb0960b87/size", O_RDONLY|O_CLOEXEC) = 10
 epoll_ctl(5, EPOLL_CTL_ADD, 10, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1811274280, u64=140189543815720}}) = -1 EPERM (Operation not permitted)
 epoll_ctl(5, EPOLL_CTL_DEL, 10, 0xc001c7fda4) = -1 EPERM (Operation not permitted)
 fstat(10, {st_mode=S_IFREG|0644, st_size=5, ...}) = 0
 read(10, "27039", 517)                  = 5
 futex(0x558e5dc53010, FUTEX_WAKE_PRIVATE, 1) = 1
 futex(0x558e5dc52f10, FUTEX_WAKE_PRIVATE, 1) = 1
 read(10, "", 512)                       = 0
 close(10)                               = 0
 openat(AT_FDCWD,     "/v2/var/lib/docker/image/overlay2/layerdb/sha256/eec71f74cdd732f5ce22e6eef945367416b2fc890441926c75bf4faeb0960b87/cache-id", O_RDONLY|O_CLOEXEC) = 10
 epoll_ctl(5, EPOLL_CTL_ADD, 10, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1811274280, u64=140189543815720}}) = -1 EPERM (Operation not permitted)
 epoll_ctl(5, EPOLL_CTL_DEL, 10, 0xc001c7fda4) = -1 EPERM (Operation not permitted)
 fstat(10, {st_mode=S_IFREG|0644, st_size=25, ...}) = 0

Repeating over and over, for multiple files in /v2/var/lib/docker/overlay2. It looks like it opens and try to epoll() every file in /v2/var/lib/docker/overlay2, only to fail miserably because it is an EPERM to epoll() a normal file descriptor. This repeats for a long time, moreover sometimes the process stops doing it and another dockerd process resumes this usage of disk. After this period of initial reading, containers start to start. However, dockerd processes degrade to doing multiple writes:

 fdatasync(41)                           = 0
 futex(0x55c22d9fd010, FUTEX_WAKE_PRIVATE, 1) = 1
 pwrite64(41, "\365\17\0\0\0\0\0\0\20\0\4\0\0\0\0\0k\0\0\0\0\0\0\0\25\20\0\0\0\0\0\0"..., 4096, 16732160) = 4096
 fdatasync(41)                           = 0
 futex(0x55c22d9fd010, FUTEX_WAKE_PRIVATE, 1) = 1
 pwrite64(41, "\1\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\355\332\f\355\2\0\0\0\0\20\0\0\0\0\0\0"..., 4096, 4096) = 4096
 fdatasync(41)                           = 0
 futex(0x55c22d9fd010, FUTEX_WAKE_PRIVATE, 1) = 1
 pwrite64(41, "k\0\0\0\0\0\0\0\20\0\4\0\0\0\0\0\365\17\0\0\0\0\0\0\25\20\0\0\0\0\0\0"..., 4096, 438272) = 4096
 fdatasync(41)                           = 0
 futex(0x55c22d9fd010, FUTEX_WAKE_PRIVATE, 1) = 1
 futex(0x55c22d9fcf10, FUTEX_WAKE_PRIVATE, 1) = 1
 pwrite64(41, "\0\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\355\332\f\355\2\0\0\0\0\20\0\0\0\0\0\0"..., 4096, 0) = 4096
 fdatasync(41)                           = 0
 futex(0x55c22d9fd010, FUTEX_WAKE_PRIVATE, 1) = 1
 pwrite64(41, "\365\17\0\0\0\0\0\0\20\0\4\0\0\0\0\0k\0\0\0\0\0\0\0\25\20\0\0\0\0\0\0"..., 4096, 16732160) = 4096
 fdatasync(41)                           = 0
 futex(0x55c22d9fd010, FUTEX_WAKE_PRIVATE, 1) = 1
 pwrite64(41, "\1\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\355\332\f\355\2\0\0\0\0\20\0\0\0\0\0\0"..., 4096, 4096) = 4096
 fdatasync(41)                           = 0
 futex(0x55c22d9fd010, FUTEX_WAKE_PRIVATE, 1) = 1
 pwrite64(41, "k\0\0\0\0\0\0\0\20\0\4\0\0\0\0\0\365\17\0\0\0\0\0\0\25\20\0\0\0\0\0\0"..., 4096, 438272) = 4096
 fdatasync(41)                           = 0
 futex(0x55c22d9fd010, FUTEX_WAKE_PRIVATE, 1) = 1
 pwrite64(41, "\0\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\355\332\f\355\2\0\0\0\0\20\0\0\0\0\0\0"..., 4096, 0) = 4096
 fdatasync(41)                           = 0
 futex(0x55c22d9fd010, FUTEX_WAKE_PRIVATE, 1) = 1
 pwrite64(41, "\365\17\0\0\0\0\0\0\20\0\4\0\0\0\0\0k\0\0\0\0\0\0\0\25\20\0\0\0\0\0\0"..., 4096, 16732160) = 4096

After this completes, docker commandline is finally available, and the daemon starts responding over the TCP socket.

The cmdline of the offending process(es) is:

/usr/bin/dockerd -g /v2/var/lib/docker -H tcp://192.168.224.150:2375 -H fd:// --containerd=/run/containerd/containerd.sock

And the contents of /etc/docker/daemon.json is

{
"experimental": true,
"features": { "buildkit": true },
"log-level": "warn"
}

docker container ls is unavailable (it hangs) until this finishes.

Most of /var/log/daemon.log used to be filled with entries such as:

 Dec 21 16:41:48 fujitsu dockerd[622]: time="2020-12-21T16:41:48.960561732+01:00" level=info msg="NetworkDB stats fujitsu(c65b23ea0561) - netID:c9i2ol5fbrf8uz632946aovkz leaving:false netPeers:1 entries:6 Queue qLen:0 netMsg/s:0"
 Dec 21 16:41:48 fujitsu dockerd[622]: time="2020-12-21T16:41:48.960570860+01:00" level=info msg="NetworkDB stats fujitsu(c65b23ea0561) - netID:r2v654bgudc9xd0492f05e5rt leaving:false netPeers:1 entries:15 Queue qLen:0 netMsg/s:0"
 Dec 21 16:41:48 fujitsu dockerd[622]: time="2020-12-21T16:41:48.960579624+01:00" level=info msg="NetworkDB stats fujitsu(c65b23ea0561) - netID:nc0uqtfxfd22hf6yc2y8u7vho leaving:false netPeers:1 entries:9 Queue qLen:0 netMsg/s:0"

Which I've fixed by setting log-level of the Docker daemon to "warn", this however has not helped the read problem.

/v2 is a separate rotational HDD from the system partition. I've noticed that clearing /v2/var/lib/docker/overlay2 stops the issue from happening for at least a week, but then it comes back like a boomerang.

The server is used as a CI server, so there are a lot of Docker images being compiled there, and they are infrequently cleaned. Could this be the culprit? Here's a few stats:

Name	Value
Amount of files in /v2/var/lib/docker/overlay2	40993
Amount of files in /v2/var/lib/docker/overlay2/l	40994
Amount of files in /v2/var/lib/docker/image	328764
Used inodes on /v2/var/lib/docker	7242204
Total amount of files in /v2/var/lib/docker/overlay2	6145533
Total size of all files in /v2/var/lib/docker/overlay2	150 GB
Total space used on /v2	520 GB

On /v2 there exists only the directory of /v2/var/lib/docker, and it's entire for Docker's usage. Please note that last 2 values were approximated respectively from total number of inodes minus number of files in remaining directories, and total disk space used minus size of files in other directories, since they would take more than 1 hour each to calculate at 100% disk read utilization (Docker was disabled for that), as I'm running a consumer-grade 7200 rpm SATA drives, namely /v2 is a 2 TB Seagate Barracuda 7200.14 (AF) ST2000DM001-1ER164 that's been online for 43808 hours now. Is this too slow for Docker? S.M.A.R.T. says the drive is all-clear.

What am I doing wrong? How to fix it? I'm running docker-ce 5:19.03.12~3-0~debian-stretch and containerd.io 1.3.7-1 on Debian 9 Stretch kernel 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02). I've long considered moving to a SSD. Would it help in the long run? Should I simply schedule a cron job to clean up the dangling images? Has anyone else ran into this problem?

Is this a known bug? Should I upgrade?

Have you checked this [Q&A](https://stackoverflow.com/questions/47382957/docker-build-image-glob-error-error-eperm-operation-not-permitted-scandir)? The resolution there was to use something different than `/` as the `WORKDIR` in the `Dockerfile` — Henrik Pingel, Dec 21 '20 at 17:48
I'm not having a problem with building images, I'm having problems with Docker daemon startup. — Piotr Maślanka, Dec 21 '20 at 18:10
Just as a question of best practice, if the disk was dedicated to Docker, why is it mounted at `/v2` and not at `/var/lib/docker`? — Michael Hampton, Dec 21 '20 at 20:15
It is a policy at my company to mount subsequent drives as /v2, /v3... It's not really my job to argue with that. — Piotr Maślanka, Dec 21 '20 at 21:51

High Docker disk read utilization on startup

0 Answers0