10

I have a small 1-manager, 3-worker cluster setup to pilot a few things. It is running swarm orchestration and is able to spin up services across the cluster from any stack yaml and serve the webapps through the ingress network. I've made no changes to the default yum installation of docker-ce. Vanilla installation with no configuration changes to any of the nodes.

There is, however, issue of inter-services communication over other overlay networks. I create a docker overlay network testnet with --attachable flag and attach an nginx (named: nginx1) container to it on node-1 and a netshoot (named: netshoot1) container to it on manager-1.

I can then ping nginx1 from netshoot1 and vice versa. I can observe these packet exchanges over tcpdump on both nodes.

# tcpdump -vvnn -i any src 10.1.72.70 and dst 10.1.72.71 and port 4789
00:20:39.302561 IP (tos 0x0, ttl 64, id 49791, offset 0, flags [none], proto UDP (17), length 134)
    10.1.72.70.53237 > 10.1.72.71.4789: [udp sum ok] VXLAN, flags [I] (0x08), vni 4101
IP (tos 0x0, ttl 64, id 20598, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.5.18 > 10.0.5.24: ICMP echo request, id 21429, seq 1, length 64

Here you can see netshoot1 (10.0.5.18) ping nginx1 (10.0.5.24) - echo successful.

However if I then # curl -v nginx1:80, the whole thing times out.

Using tcpdump, I can see the packets leave manager-1 node, but they never arrive on node-1.

00:22:22.809057 IP (tos 0x0, ttl 64, id 42866, offset 0, flags [none], proto UDP (17), length 110)
    10.1.72.70.53764 > 10.1.72.71.4789: [bad udp cksum 0x5b97 -> 0x697d!] VXLAN, flags [I] (0x08), vni 4101
IP (tos 0x0, ttl 64, id 43409, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.5.18.53668 > 10.0.5.24.80: Flags [S], cksum 0x1e58 (incorrect -> 0x2c3e), seq 1616566654, win 28200, options [mss 1410,sack OK,TS val 913132903 ecr 0,nop,wscale 7], length 0

These are VMs running on an in-house datacenter over vmware. The networking team says network firewall shouldn't be blocking or inspecting them as the ips are on the same subnet.

Is this an issue with docker configuration? Iptables?

OS: RHEL 8

Docker CE: 20.10.2

containerd: 1.4.3

IPTABLES on manager-1

Chain INPUT (policy DROP 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination
1    9819K 2542M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED
2        8   317 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            icmptype 255
3      473 33064 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0
4        0     0 DROP       all  --  *      *       127.0.0.0/8          0.0.0.0/0
5      116  6192 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW tcp dpt:22
6     351K   21M ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            source IP range 10.1.72.71-10.1.72.73 state NEW multiport dports 2377,7946 
7      435 58400 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            source IP range 10.1.72.71-10.1.72.73 state NEW multiport dports 7946,4789
8    17142 1747K REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-host-prohibited

Chain FORWARD (policy DROP 8 packets, 384 bytes)
num   pkts bytes target     prot opt in     out     source               destination
1    14081   36M DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0
2    14081   36M DOCKER-INGRESS  all  --  *      *       0.0.0.0/0            0.0.0.0/0
3     267K  995M DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0
4    39782  121M ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
5     1598 95684 DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0
6    41470  717M ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0
7        0     0 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0
8    90279   23M ACCEPT     all  --  *      docker_gwbridge  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
9        5   300 DOCKER     all  --  *      docker_gwbridge  0.0.0.0/0            0.0.0.0/0
10   94041  134M ACCEPT     all  --  docker_gwbridge !docker_gwbridge  0.0.0.0/0            0.0.0.0/0
11       0     0 DROP       all  --  docker_gwbridge docker_gwbridge  0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT 11M packets, 2365M bytes)
num   pkts bytes target     prot opt in     out     source               destination

Chain DOCKER (2 references)
num   pkts bytes target     prot opt in     out     source               destination
1     1598 95684 ACCEPT     tcp  --  !docker0 docker0  0.0.0.0/0            172.17.0.2           tcp dpt:5000

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
num   pkts bytes target     prot opt in     out     source               destination
1    41470  717M DOCKER-ISOLATION-STAGE-2  all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0
2    93853  133M DOCKER-ISOLATION-STAGE-2  all  --  docker_gwbridge !docker_gwbridge  0.0.0.0/0            0.0.0.0/0
3     267K  995M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

Chain DOCKER-USER (1 references)
num   pkts bytes target     prot opt in     out     source               destination
1    1033K 1699M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

Chain DOCKER-INGRESS (1 references)
num   pkts bytes target     prot opt in     out     source               destination
1        0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8502
2        0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED tcp spt:8502
3     267K  995M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

Chain DOCKER-ISOLATION-STAGE-2 (2 references)
num   pkts bytes target     prot opt in     out     source               destination
1        0     0 DROP       all  --  *      docker0  0.0.0.0/0            0.0.0.0/0
2        0     0 DROP       all  --  *      docker_gwbridge  0.0.0.0/0            0.0.0.0/0
3     135K  851M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

IPTABLES on node-1

Chain INPUT (policy DROP 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination
1    6211K 3343M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED
2        7   233 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            icmptype 255
3      471 32891 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0
4        0     0 DROP       all  --  *      *       127.0.0.0/8          0.0.0.0/0
5       84  4504 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            state NEW tcp dpt:22 /* ssh from anywhere */
6    26940 1616K ACCEPT     tcp  --  *      *       10.1.72.70           0.0.0.0/0            state NEW multiport dports 7946 /* docker swarm cluster comm- manager,node2,3 */
7    31624 1897K ACCEPT     tcp  --  *      *       10.1.72.72           0.0.0.0/0            state NEW multiport dports 7946 /* docker swarm cluster comm- manager,node2,3 */
8    30583 1835K ACCEPT     tcp  --  *      *       10.1.72.73           0.0.0.0/0            state NEW multiport dports 7946 /* docker swarm cluster comm- manager,node2,3 */
9      432 58828 ACCEPT     udp  --  *      *       10.1.72.70           0.0.0.0/0            state NEW multiport dports 7946,4789 /* docker swarm cluster comm and overlay netw- manager,node2,3 */
10      10  1523 ACCEPT     udp  --  *      *       10.1.72.72           0.0.0.0/0            state NEW multiport dports 7946,4789 /* docker swarm cluster comm and overlay netw- manager,node2,3 */
11       7  1159 ACCEPT     udp  --  *      *       10.1.72.73           0.0.0.0/0            state NEW multiport dports 7946,4789 /* docker swarm cluster comm and overlay netw- manager,node2,3 */
12   17172 1749K REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-host-prohibited

Chain FORWARD (policy DROP 19921 packets, 1648K bytes)
num   pkts bytes target     prot opt in     out     source               destination
1    23299   22M DOCKER-USER  all  --  *      *       0.0.0.0/0            0.0.0.0/0
2    23299   22M DOCKER-INGRESS  all  --  *      *       0.0.0.0/0            0.0.0.0/0
3     787K 1473M DOCKER-ISOLATION-STAGE-1  all  --  *      *       0.0.0.0/0            0.0.0.0/0
4        0     0 ACCEPT     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
5        0     0 DOCKER     all  --  *      docker0  0.0.0.0/0            0.0.0.0/0
6        0     0 ACCEPT     all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0
7        0     0 ACCEPT     all  --  docker0 docker0  0.0.0.0/0            0.0.0.0/0
8     386K  220M ACCEPT     all  --  *      docker_gwbridge  0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
9        0     0 DOCKER     all  --  *      docker_gwbridge  0.0.0.0/0            0.0.0.0/0
10    402K 1254M ACCEPT     all  --  docker_gwbridge !docker_gwbridge  0.0.0.0/0            0.0.0.0/0
11       0     0 DROP       all  --  docker_gwbridge docker_gwbridge  0.0.0.0/0            0.0.0.0/0

Chain OUTPUT (policy ACCEPT 8193K packets, 2659M bytes)
num   pkts bytes target     prot opt in     out     source               destination

Chain DOCKER-INGRESS (1 references)
num   pkts bytes target     prot opt in     out     source               destination
1        0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:8502
2        0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            state RELATED,ESTABLISHED tcp spt:8502
3     787K 1473M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

Chain DOCKER-USER (1 references)
num   pkts bytes target     prot opt in     out     source               destination
1     792K 1474M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

Chain DOCKER (2 references)
num   pkts bytes target     prot opt in     out     source               destination

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
num   pkts bytes target     prot opt in     out     source               destination
1        0     0 DOCKER-ISOLATION-STAGE-2  all  --  docker0 !docker0  0.0.0.0/0            0.0.0.0/0
2     402K 1254M DOCKER-ISOLATION-STAGE-2  all  --  docker_gwbridge !docker_gwbridge  0.0.0.0/0            0.0.0.0/0
3     787K 1473M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0

Chain DOCKER-ISOLATION-STAGE-2 (2 references)
num   pkts bytes target     prot opt in     out     source               destination
1        0     0 DROP       all  --  *      docker0  0.0.0.0/0            0.0.0.0/0
2        0     0 DROP       all  --  *      docker_gwbridge  0.0.0.0/0            0.0.0.0/0
3     402K 1254M RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0
BKaun
  • 517
  • 7
  • 17
  • `curl -v nginx1:80` would work only from container in same stack (not from host). Can you share docker-compose.yml file too? – Facty Feb 19 '21 at 11:23
  • There's no compose file. I'm simply creating one overlay network from the manager. Then launching two containers that are attached to that network. One is the nginx container and the other is just a bash (netshoot). I can ping the nginx from the bash, so the network and dns seem to be working. Yet I cannot curl or telnet or nc. – BKaun Feb 19 '21 at 18:04
  • We're still facing this problem, but something of relevance seems to be that everything works perfectly fine on RHEL 7. RHEL 8 causes this issue. – BKaun Feb 22 '21 at 19:33
  • 1
    Sorry, didnt have time. Can you then share command used to start containers? Also (..I'm terible with IPTABLES..) `firewall-cmd --list-ports`, `firewall-cmd --list-services` & `firewall-cmd --list-all-zones` – Facty Feb 23 '21 at 10:42
  • Thanks. It turned out to be a network issue after all. I've posted the answer. – BKaun Feb 23 '21 at 17:55

4 Answers4

14

The issue was indeed the bad checksums on the outbound packets. The vmware network interface was dropping the packets due to bad checksums.

The solution was to disable checksum offloading. Using ethtool:

# ethtool -K <interface> tx off
BKaun
  • 517
  • 7
  • 17
  • hi ! thank's for sharing an answer. Spend almost day debugging. You are my Day saver ! – valc May 17 '21 at 07:57
  • 1
    I spent more than one day debugging! I hope this resolves it... – CivFan Jun 03 '21 at 23:40
  • 2
    I haved execute `ethtool -K ens160 tx off` and `ethtool -K docker0 tx off` and `ethtool -K docker_gwbridge tx off` on all my node. But problem is still exists:( – Libraco Dec 13 '21 at 13:47
  • Did you ever figure out why checksum offloading was causing bad checksums? What version of ESXi was on your hosts? – Kayson Oct 04 '22 at 15:53
  • @CivFan me too `ethtool -K ens192 tx off` solved my issue – Saeb Molaee Oct 15 '22 at 11:53
4

I had the exact same problem (the only thing that was working in my overlay network was ping, everything else just disappeared). This thread saved me after days of pulling my hair, so I thought I'd add my five cents.

This was also on vmware servers, running Ubuntu 22.04. My solution was to change network interface type from vmxnet3 to a simple E1000E card and suddenly everything just started working. So obviously there's something weird happening in vmxnet3. What buffles me is that this doesn't seem to be a big issue for more users, running a Docker swarm on vmware servers should be kinda normal, right?

Daniel Malmgren
  • 137
  • 2
  • 14
  • Thanks so much for sharing this. I can confirm I see the same which indeed points to the vmxnet3 checksum implementation (or handling of failures) specifically. I did lose days investigating this problem! – fifofonix Jan 13 '23 at 21:50
  • 1
    Actually I've got more information since writing this that I can share. It seems the core problem is that both vmxnet3 and the docker swarm overlay network uses the same tcp port, 4789. I've got indications that a solution might be changing that port which you use with the --data-path-port argument when doing the docker swarm init. Haven't had time testing this myself (so I'm still on E1000E), but it might be worth trying! – Daniel Malmgren Jan 15 '23 at 09:23
  • @DanielMalmgren: thank you so much for sharing this. I've been struggling with Docker Swarm connectivity issues for a long time. Nothing helped; I was considering switching to Kubernetes. Trying your suggestion to specify a different `--data-path-port` fixed everything. I'm so happy I can keep on using Docker Swarm (because I love it). Thanks again! FYI: I'm using KVM-virtualized Ubuntu VMs at a small hosting provider. – user1018303 Mar 17 '23 at 07:46
1

The same issue solved for me without using "ethtool", just by using endpoint_mode and using host mode for publishing ports. here is the changes that I added in my compose:

1-
ports:
  - target: 2379
    published: 2379
    protocol: tcp
    mode: host
2- 
deploy:
  endpoint_mode: dnsrr
3- adding
hostname: <service_name>
1

Just in case anyone tried ethtool -K <iface name> tx off and still does not work, try to change the MTU size of your overlay network to lover than the standard (1500).

For example:

docker network create -d overlay --attachable --opt com.docker.network.driver.mtu=1450 my-network
Pujianto
  • 161
  • 1
  • 6