I have a docker swarm overlay network that connects 6 nodes each running 4 containers with highly frequent communication. I have been trying to identify the bottleneck with my network to realize that the culprit is the ksoftirqd process related to the docker swarm networking that uses up all the CPU in the manager node and causes my app to crash. So my question is has anyone found a workaround for this? I am trying to avoid migration to Kubernetes.
The Hardware info:
System: Host: caliper-latest Kernel: 4.15.0-99-generic x86_64 bits: 64 gcc: 7.5.0 Console: tty 1
Distro: Ubuntu 18.04.4 LTS
Machine: Device: kvm System: QEMU product: Standard PC (i440FX + PIIX 1996) v: pc-i440fx-2.8 serial: N/A
Mobo: N/A model: N/A serial: N/A BIOS: SeaBIOS v: 1.10.2-1 date: 04/01/2014
CPU(s): 15 Single core QEMU Virtual version 2.5+s (-SMP-) arch: P6 II rev.3 cache: 245760 KB
flags: (lm nx sse sse2 sse3) bmips: 79799
clock speeds: max: 2659 MHz 1: 2659 MHz 2: 2659 MHz 3: 2659 MHz 4: 2659 MHz 5: 2659 MHz 6: 2659 MHz
7: 2659 MHz 8: 2659 MHz 9: 2659 MHz 10: 2659 MHz 11: 2659 MHz 12: 2659 MHz 13: 2659 MHz 14: 2659 MHz
15: 2659 MHz
Graphics: Card: Cirrus Logic GD 5446 bus-ID: 00:02.0
Display Server: N/A driver: N/A tty size: 270x20 Advanced Data: N/A out of X
Network: Card: Realtek RTL-8100/8101L/8139 PCI Fast Ethernet Adapter
driver: 8139cp v: 1.3 port: c000 bus-ID: 00:03.0
IF: ens3 state: up speed: 100 Mbps duplex: full mac: <filter>
Drives: HDD Total Size: 32.2GB (27.3% used)
ID-1: /dev/vda model: N/A size: 32.2GB
Partition: ID-1: / size: 29G used: 8.2G (29%) fs: ext4 dev: /dev/vda1
RAID: No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors: None detected - is lm-sensors installed and configured?
Info: Processes: 238 Uptime: 3:46 Memory: 774.3/14022.2MB Init: systemd runlevel: 5 Gcc sys: 7.5.0
Client: Shell (bash 4.4.201) inxi: 2.3.56