I have 3 clusters to deploy on containers provisioned with docker-compose, the clusters have 3, 4 and 14 nodes each. All containers are built from the same image of Rocky Linux 8 and have SystemD with multiple services running on them. In each docker-compose.yml /sys/fs/cgroup:/sys/fs/cgroup:ro
is added to the volumes. However no matter the order I provision each cluster I only get SystemD running on 14 containers that were brought up first. The 15th container in the sequence as well as every other after that comes up, but has not services running and systemctl gives the error, ie.:
# systemctl status sshd
Failed to connect to bus: No such file or directory
When I get into a container with docker exec I can see that the cgroup mount is in place, but ps aux
only shows:
# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 89084 7556 ? Ss 13:00 0:00 /usr/sbin/init
root 6 0.0 0.0 21324 3796 pts/0 Ss 13:03 0:00 bash
root 23 0.0 0.0 53952 3848 pts/0 R+ 13:15 0:00 ps aux
while on a good container there is more:
# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 171716 10104 ? Ss 13:00 0:00 /usr/sbin/init
root 28 0.0 0.0 89464 13432 ? Ss 13:00 0:00 /usr/lib/systemd/systemd-journald
rpc 30 0.0 0.0 67196 5596 ? Ss 13:00 0:00 /usr/bin/rpcbind -w -f
root 32 0.0 0.0 202388 14172 ? Ss 13:00 0:00 /usr/sbin/sssd -i --logger=files
root 45 0.0 0.0 78648 7028 ? Ss 13:00 0:00 /usr/sbin/sshd -D -oCiphers=aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes256-ctr
root 47 0.0 0.0 209484 7108 ? Ssl 13:00 0:00 /usr/sbin/rsyslogd -n
root 49 0.0 0.0 106028 3708 ? Ssl 13:00 0:00 /usr/sbin/gssproxy -D
dbus 61 0.0 0.0 76488 5424 ? Ss 13:00 0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root 62 0.0 0.0 212680 15516 ? S 13:00 0:00 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files
root 63 0.0 0.1 224344 40632 ? S 13:00 0:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
root 67 0.0 0.0 90364 7420 ? Ss 13:01 0:00 /usr/lib/systemd/systemd-logind
root 645 0.3 0.0 30364 3796 pts/0 Ss 13:15 0:00 bash
root 663 0.0 0.0 62992 3976 pts/0 R+ 13:15 0:00 ps aux
When I try to start dbus-daemon on a broken container I get this error:
# /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
dbus-daemon[24]: Failed to start message bus: No socket received.
Two things puzzle me most:
- It used to work fine without such issues with CentOS 7 images.
- The hard limit of 14 containers that can be provisioned correctly.
Re #2. If I reduce the number of running containers, the broken containers get SystemD working after restart, so it seems like there is a limit set somewhere, but I was not able to find where yet, so any comments or answers that help to explain or solve this problem will be highly appreciated.