0

It is a Debian 9 container running on a Debian 9 host.

Every now and then the container would grind to a halt. It is usually a slow task like apticron or munin. My last test was an idle container on an idle server, and the same thin happened again. Over 300 Cron processes.

The host copes with all these tasks and more, it is just inside a container that these problems start.

Any tips or suggestions? Thanks

root       816  0.0  0.0  54812  3860 ?        Ss   Aug28   0:00 [lxcmonitor] /srv/lxc test-01
root       908  0.0  0.0  28236  4428 ?        Ss   Aug28   0:00  \_ /sbin/init
root      1266  0.0  0.0  32968  3956 ?        Ss   Aug28   0:00      \_ /lib/systemd/systemd-journald
root      1445  0.0  0.0  37080  2700 ?        Ss   Aug28   0:00      \_ /sbin/rpcbind -w
statd     1466  0.0  0.0  37280  2908 ?        Ss   Aug28   0:00      \_ /sbin/rpc.statd
root      1491  0.0  0.0  27568   228 ?        Ss   Aug28   0:00      \_ /usr/sbin/rpc.idmapd
root      1496  0.0  0.0  27504  2768 ?        Ss   Aug28   0:00      \_ /usr/sbin/cron -f
root      9455  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     10293  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     11147  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     12047  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     12881  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     13746  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     14592  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     15407  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     15425  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     16286  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     17115  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     17958  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     18838  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     19645  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     19671  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     20522  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     21377  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     22211  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     23052  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     23900  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f
root     24743  0.0  0.0  52788  2800 ?        S    Aug28   0:00      |   \_ /usr/sbin/CRON -f

Just happened again. https://i.stack.imgur.com/M2Wr7.png

Tried to SSH to host

packet_write_wait: Connection to 192.168.0.16 port 22: Broken pipe

SSH to container worked. Reboot container, ping container until responds. SSH to container still works and now SSH to host is working fine. He. This is driving me bonkers.

More munin graphs

Dax
  • 294
  • 2
  • 11
  • This is a problem related to load balancing system design. You can describe in more detail so people can help. Is there a MicroService? What do you use to manage containers? – Long Vũ Sep 03 '18 at 07:31
  • One container is running Munin for 5 servers, so not very busy. The other one is just a samba fileserver that is not in use. According to Munin. Max CPU usage on a 24 hour period is about 2%. This box is sitting idle waiting to go into production. It is just managed with plain LXC commands on a Debian 9 server – Dax Sep 04 '18 at 07:30
  • Are you sure it's "grinding to a halt"? It seems to me as if the script is simply freezing. How often is the cron job being executed? Can you put a check at the start of the job that makes it exit if another instance of the same is already running? – Law29 Sep 04 '18 at 22:13
  • It is stock standard munin and apticron installed via apt. Nothing special. What is weird is that on the host it is fine, but in the container it is not. I feel like it is LXC related but not sure where to look for the issue. – Dax Sep 05 '18 at 15:05
  • See my edit about it happening again – Dax Sep 05 '18 at 15:16

0 Answers0