0

I have 3 clusters to deploy on containers provisioned with docker-compose, the clusters have 3, 4 and 14 nodes each. All containers are built from the same image of Rocky Linux 8 and have SystemD with multiple services running on them. In each docker-compose.yml /sys/fs/cgroup:/sys/fs/cgroup:ro is added to the volumes. However no matter the order I provision each cluster I only get SystemD running on 14 containers that were brought up first. The 15th container in the sequence as well as every other after that comes up, but has not services running and systemctl gives the error, ie.:

# systemctl status sshd
Failed to connect to bus: No such file or directory

When I get into a container with docker exec I can see that the cgroup mount is in place, but ps aux only shows:

# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  89084  7556 ?        Ss   13:00   0:00 /usr/sbin/init
root         6  0.0  0.0  21324  3796 pts/0    Ss   13:03   0:00 bash
root        23  0.0  0.0  53952  3848 pts/0    R+   13:15   0:00 ps aux

while on a good container there is more:

# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0 171716 10104 ?        Ss   13:00   0:00 /usr/sbin/init
root        28  0.0  0.0  89464 13432 ?        Ss   13:00   0:00 /usr/lib/systemd/systemd-journald
rpc         30  0.0  0.0  67196  5596 ?        Ss   13:00   0:00 /usr/bin/rpcbind -w -f
root        32  0.0  0.0 202388 14172 ?        Ss   13:00   0:00 /usr/sbin/sssd -i --logger=files
root        45  0.0  0.0  78648  7028 ?        Ss   13:00   0:00 /usr/sbin/sshd -D -oCiphers=aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes256-ctr
root        47  0.0  0.0 209484  7108 ?        Ssl  13:00   0:00 /usr/sbin/rsyslogd -n
root        49  0.0  0.0 106028  3708 ?        Ssl  13:00   0:00 /usr/sbin/gssproxy -D
dbus        61  0.0  0.0  76488  5424 ?        Ss   13:00   0:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation 
root        62  0.0  0.0 212680 15516 ?        S    13:00   0:00 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files
root        63  0.0  0.1 224344 40632 ?        S    13:00   0:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
root        67  0.0  0.0  90364  7420 ?        Ss   13:01   0:00 /usr/lib/systemd/systemd-logind
root       645  0.3  0.0  30364  3796 pts/0    Ss   13:15   0:00 bash
root       663  0.0  0.0  62992  3976 pts/0    R+   13:15   0:00 ps aux

When I try to start dbus-daemon on a broken container I get this error:

# /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
dbus-daemon[24]: Failed to start message bus: No socket received.

Two things puzzle me most:

  1. It used to work fine without such issues with CentOS 7 images.
  2. The hard limit of 14 containers that can be provisioned correctly.

Re #2. If I reduce the number of running containers, the broken containers get SystemD working after restart, so it seems like there is a limit set somewhere, but I was not able to find where yet, so any comments or answers that help to explain or solve this problem will be highly appreciated.

mac13k
  • 133
  • 7

0 Answers0