docker containers with "public" IPs, bridged network

Question

I basically followed this guide: Docker Containers with Public IPs

We already have a similar setup working in another location, but I can't get it working in a new environment. Sadly, my predecessor hasn't documented anything, so im trying to reverse engineer the setup.

Docker Host: 10.10.60.41/24

with docker bridged network: docker network create --subnet=10.60.0.0/16 --opt "com.docker.network.bridge.name"="br-ext" ext

routes on docker host:

#  ip r
default via 10.10.60.1 dev br0 proto static 
10.10.60.0/24 dev br0 proto kernel scope link src 10.10.60.41 
10.60.0.0/16 dev br-ext proto kernel scope link src 10.60.0.1 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdow

run a docker container: docker run --network=ext -itd --name=web nginx

That docker container gets IP 10.60.0.2 assigned.

ping 10.60.0.2 or curl 10.80.0.2 from the docker host is working fine...as expected.

But the docker container is not reachable from the network. A network route for 10.60.0.0/16 to the primary IP of the docker host 10.10.60.41 is set.

# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
DOCKER-USER  all  --  anywhere             anywhere            
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (2 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere            
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-ISOLATION-STAGE-2 (2 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere            
DROP       all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere

# iptables -t nat -L -n -v
Chain PREROUTING (policy ACCEPT 35363 packets, 2140K bytes)
 pkts bytes target     prot opt in     out     source               destination         
 140K 8413K DOCKER     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT 24828 packets, 1495K bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 286 packets, 19813 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    6   504 DOCKER     all  --  *      *       0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT 10799 packets, 659K bytes)
 pkts bytes target     prot opt in     out     source               destination         
    6   504 MASQUERADE  all  --  *      !br-ext  10.60.0.0/16         0.0.0.0/0           
    0     0 MASQUERADE  all  --  *      !docker0  172.17.0.0/16        0.0.0.0/0           

Chain DOCKER (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    2   168 RETURN     all  --  br-ext *       0.0.0.0/0            0.0.0.0/0           
    0     0 RETURN     all  --  docker0 *       0.0.0.0/0            0.0.0.0/0

# sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1

The two setups are basically identical, except subnets etc. But it looks like I'm missing something here...any help would be greatly appreciated.

Thanks in advance and have a nice day!

=====

EDIT -answer to larsks

yes packets can reach the host/container: 10.10.60.6 > 10.60.1.25

# tcpdump -n -i any icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
15:34:52.257656 IP 10.10.60.6 > 10.60.1.25: ICMP echo request, id 879, seq 1, length 64
15:34:52.257731 IP 10.10.60.6 > 10.60.1.25: ICMP echo request, id 879, seq 1, length 64
15:34:52.257741 IP 10.10.60.6 > 10.60.1.25: ICMP echo request, id 879, seq 1, length 64
15:34:52.257799 IP 10.60.1.25 > 10.10.60.6: ICMP echo reply, id 879, seq 1, length 64
15:34:52.257799 IP 10.60.1.25 > 10.10.60.6: ICMP echo reply, id 879, seq 1, length 64
15:34:52.257826 IP 10.60.1.25 > 10.10.60.6: ICMP echo reply, id 879, seq 1, length 64

even a ICMP reply is send

on the host 10.10.60.6 sending the ICMP requests, no replies

# tcpdump -i any icmp and host 10.60.1.25
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
15:36:52.042690 IP vpnconnect > 10.60.1.25: ICMP echo request, id 879, seq 118, length 64
15:36:53.066672 IP vpnconnect > 10.60.1.25: ICMP echo request, id 879, seq 119, length 64
15:36:54.090729 IP vpnconnect > 10.60.1.25: ICMP echo request, id 879, seq 120, length 64
15:36:55.114713 IP vpnconnect > 10.60.1.25: ICMP echo request, id 879, seq 121, length 6

additional Infos: when sending ICMP reuqest from one of the docker container to 10.10.60.6, this works

$ ping 10.10.60.6
PING 10.10.60.6 (10.10.60.6): 56 data bytes
64 bytes from 10.10.60.6: seq=0 ttl=42 time=1.051 ms
64 bytes from 10.10.60.6: seq=1 ttl=42 time=0.738 ms

on 10.10.60.6 it look like this:

# tcpdump -i any icmp and host 10.10.60.41
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
15:40:57.489752 IP 10.10.60.41 > host: ICMP echo request, id 42, seq 38, length 64
15:40:57.489771 IP host > 10.10.60.41: ICMP echo reply, id 42, seq 38, length 64

requests and replies to the Docker host(?)

You say, "A network route for 10.60.0.0/16 to the primary IP of the docker host 10.10.60.41 is set", but where is that set? Can you show the configuration? — larsks, Aug 13 '22 at 22:24
If you attempt to access the container from elsewhere on the network, do the packets reach your host (e.g., using `tcpdump`)? — larsks, Aug 13 '22 at 22:25
routes are set an the gateway: `# ip r` [..] `10.60.0.0/16 via 10.10.60.41 dev ens3` [..] — mab, Aug 15 '22 at 07:38
It's really better if you *update your question* when adding new information, because it's not possible to format things usefully in comments. In any case, I've added an answer since that last comment that hopefully helps a bit. — larsks, Aug 15 '22 at 15:30
thanks, I updated the original question and deleted the comment, to make it a little clearer — mab, Aug 15 '22 at 16:26

larsks · Accepted Answer · 2022-08-15T15:17:58.103

I've reproduced your environment in virtual machines in order to take a look at the problem. You can find the complete configuration here.

In this configuration, I have the following three nodes:

node1 is the docker host @ 10.10.60.41.
node2 is the router at 10.10.60.1
node3 is a random other host on the network at 10.10.60.20.

I've created the docker network as in your example, so that on node1, the container host, I have bridge br-ext configured like this:

node# ip addr show br-ext
5: br-ext: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:83:d4:10:54 brd ff:ff:ff:ff:ff:ff
    inet 10.60.0.1/16 brd 10.60.255.255 scope global br-ext
       valid_lft forever preferred_lft forever
    inet6 fe80::42:83ff:fed4:1054/64 scope link
       valid_lft forever preferred_lft forever

And the following routing table (you can ignore the routes to 192.168.121.0/24; this is an artifact of how vagrant handles configuration):

default via 10.10.60.1 dev eth1
10.10.60.0/24 dev eth1 proto kernel scope link src 10.10.60.41 metric 101
10.60.0.0/16 dev br-ext proto kernel scope link src 10.60.0.1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.121.0/24 dev eth0 proto kernel scope link src 192.168.121.12 metric 100

And on node2, the network router, I have:

default via 192.168.121.1 dev eth0 proto dhcp metric 100
10.10.60.0/24 dev eth1 proto kernel scope link src 10.10.60.1 metric 101
10.60.0.0/16 via 10.10.60.41 dev eth1 proto static metric 101
192.168.121.0/24 dev eth0 proto kernel scope link src 192.168.121.181 metric 100

On node3, I have:

default via 10.10.60.1 dev eth1
10.10.60.0/24 dev eth1 proto kernel scope link src 10.10.60.20 metric 101
192.168.121.0/24 dev eth0 proto kernel scope link src 192.168.121.184 metric 100

With the above configuration in place, if I run on node1:

node1# tcpdump -n -i any icmp

And on node3 I run:

node3# ping 10.60.0.2

I see in the output of tcpdump:

14:12:25.777825 eth1  In  IP 10.10.60.1 > 10.60.0.2: ICMP echo request, id 2, seq 1, length 64
14:12:26.836689 eth1  In  IP 10.10.60.1 > 10.60.0.2: ICMP echo request, id 2, seq 2, length 64
14:12:27.860833 eth1  In  IP 10.10.60.1 > 10.60.0.2: ICMP echo request, id 2, seq 3, length 64

So, the ICMP echo requests are showing up on node1, which means that our networking routing is correct (node3 is sending the requests via node2, the network router, which is correctly passing them on to node1)...but we're not seeing any replies. What could be the reason?

One mechanism we can use to diagnose this is to enable packet tracing in our netfilter configuration on node1:

node1# iptables -t raw -A PREROUTING -s 10.10.60.20 -j TRACE

Now when we attempt to ping from node3 to node1, we see the following kernel logs:

Aug 15 14:19:05 node1 kernel: TRACE: raw:PREROUTING:policy:2 IN=eth1 OUT= MAC=52:54:00:95:12:24:52:54:00:f4:d3:e4:08:00 SRC=10.10.60.20 DST=10.60.0.2 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=20048 DF PROTO=ICMP TYPE=8 CODE=0 ID=3 SEQ=1
Aug 15 14:19:05 node1 kernel: TRACE: nat:PREROUTING:policy:2 IN=eth1 OUT= MAC=52:54:00:95:12:24:52:54:00:f4:d3:e4:08:00 SRC=10.10.60.20 DST=10.60.0.2 LEN=84 TOS=0x00 PREC=0x00 TTL=63 ID=20048 DF PROTO=ICMP TYPE=8 CODE=0 ID=3 SEQ=1
Aug 15 14:19:05 node1 kernel: TRACE: filter:FORWARD:rule:1 IN=eth1 OUT=br-ext MAC=52:54:00:95:12:24:52:54:00:f4:d3:e4:08:00 SRC=10.10.60.20 DST=10.60.0.2 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=20048 DF PROTO=ICMP TYPE=8 CODE=0 ID=3 SEQ=1
Aug 15 14:19:05 node1 kernel: TRACE: filter:DOCKER-USER:return:1 IN=eth1 OUT=br-ext MAC=52:54:00:95:12:24:52:54:00:f4:d3:e4:08:00 SRC=10.10.60.20 DST=10.60.0.2 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=20048 DF PROTO=ICMP TYPE=8 CODE=0 ID=3 SEQ=1
Aug 15 14:19:05 node1 kernel: TRACE: filter:FORWARD:rule:2 IN=eth1 OUT=br-ext MAC=52:54:00:95:12:24:52:54:00:f4:d3:e4:08:00 SRC=10.10.60.20 DST=10.60.0.2 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=20048 DF PROTO=ICMP TYPE=8 CODE=0 ID=3 SEQ=1
Aug 15 14:19:05 node1 kernel: TRACE: filter:DOCKER-ISOLATION-STAGE-1:return:3 IN=eth1 OUT=br-ext MAC=52:54:00:95:12:24:52:54:00:f4:d3:e4:08:00 SRC=10.10.60.20 DST=10.60.0.2 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=20048 DF PROTO=ICMP TYPE=8 CODE=0 ID=3 SEQ=1
Aug 15 14:19:05 node1 kernel: TRACE: filter:FORWARD:rule:4 IN=eth1 OUT=br-ext MAC=52:54:00:95:12:24:52:54:00:f4:d3:e4:08:00 SRC=10.10.60.20 DST=10.60.0.2 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=20048 DF PROTO=ICMP TYPE=8 CODE=0 ID=3 SEQ=1
Aug 15 14:19:05 node1 kernel: TRACE: filter:DOCKER:return:1 IN=eth1 OUT=br-ext MAC=52:54:00:95:12:24:52:54:00:f4:d3:e4:08:00 SRC=10.10.60.20 DST=10.60.0.2 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=20048 DF PROTO=ICMP TYPE=8 CODE=0 ID=3 SEQ=1
Aug 15 14:19:05 node1 kernel: TRACE: filter:FORWARD:policy:11 IN=eth1 OUT=br-ext MAC=52:54:00:95:12:24:52:54:00:f4:d3:e4:08:00 SRC=10.10.60.20 DST=10.60.0.2 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=20048 DF PROTO=ICMP TYPE=8 CODE=0 ID=3 SEQ=1

(Don't forget to disable tracing at this point: iptables -t raw -F PREROUTING.)

Looking at the above logs, we see that our packet eventually entered the FORWARD chain, where it failed to match any terminal rules and eventually fell off the bottom, where it gets handled by the default policy:

Aug 15 14:19:05 node1 kernel: TRACE: filter:FORWARD:policy:11 ...

Which has been set to DROP by the Docker installation:

node1# iptables -S FORWARD
-P FORWARD DROP
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o br-ext -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o br-ext -j DOCKER
-A FORWARD -i br-ext ! -o br-ext -j ACCEPT
-A FORWARD -i br-ext -o br-ext -j ACCEPT
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT

The solution here is to add a new rule explicitly permitting traffic from the host network to the container network. On node1:

node1# iptables -A FORWARD -s 10.10.60.0/24 -d 10.60.0.0/16 -j ACCEPT

With this rule in place, I can now ping successfully from node3 to the nginx container on node1:

node3# ping -c2 10.60.0.2
PING 10.60.0.2 (10.60.0.2) 56(84) bytes of data.
64 bytes from 10.60.0.2: icmp_seq=1 ttl=63 time=0.417 ms
64 bytes from 10.60.0.2: icmp_seq=2 ttl=63 time=0.328 ms

--- 10.60.0.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1047ms
rtt min/avg/max/mdev = 0.328/0.372/0.417/0.044 ms

I can also successfully access the Nginx instance running in the web container:

node3# curl 10.60.0.2
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
.
.
.

And connectivity in the reverse directions works as well; on node1:

node1# docker exec -it apt-get update
node1# docker exec -it apt-get -y install iputils-ping
node1# docker exec -it web ping -c2 10.10.60.20
PING 10.10.60.20 (10.10.60.20) 56(84) bytes of data.
64 bytes from 10.10.60.20: icmp_seq=1 ttl=63 time=0.304 ms
64 bytes from 10.10.60.20: icmp_seq=2 ttl=63 time=0.378 ms

--- 10.10.60.20 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1024ms
rtt min/avg/max/mdev = 0.304/0.341/0.378/0.037 ms

This may not exactly match your configuration: for example, in your output, your FORWARD chain has a default policy of ACCEPT, which suggests that maybe something else is going on...but hopefully it suggests a few ways that you can troubleshoot the problem.

If you'd like to update your question with additional information about your environment, I would be happy to take another look.

Many thanks again! Indeed this works...i tested it again in another environment, with no problems at all, except i had to add `iptables -P FORWARD ACCEPT` and then it worked. I have to open a support request with the cloud provider, I think...it has to be something in their environment. — mab, Aug 16 '22 at 12:20
just a little follow up, it turned out it's like I already suspected... Subnets defined locally on the instances are unknown to Openstack and are not routed between the VMs. i didn't know that... Many thanks — mab, Aug 17 '22 at 14:20

score -1 · Answer 2 · answered Aug 14 '22 at 05:30

-1

You cannot ping a Docker container from an external host by default.
By default, any service running inside a Docker container is not "Published" (Docker terminology) and cannot be reached from outside. You have to explicitly define/allow the services you want to be published when you run your container.
In Your case i don't see in you docker run command any port publishing, and you are using nginx so i guess you should publish 80 at least.

answered Aug 14 '22 at 05:30

Salar

142
8

Hi and thanks for your reply I know what you mean, but it's not what I/we want. When i publish/expose port 80 of the nginx service/container, then i can reach the nginx i.e. |DOCKER_HOST_IP]:80. But we want the container to be reachable over it's own IP i.e. 10.60.0.2. – mab Aug 15 '22 at 07:34
So what you want is only possible by using other network drivers than bridge. I think for your scenario you can use IPVLAN Network driver. Here a link that describes all docker network drivers with details 'https://www.youtube.com/watch?v=bKFMS5C4CG0 ' – Salar Aug 15 '22 at 11:34
@Salar, that's incorrect. This configuration is entirely possible using the `bridge` driver. – larsks Aug 15 '22 at 13:57
@larsks I didn't say it is impossible but when you have easier options base on your need why you insist to use the hard way? – Salar Aug 16 '22 at 08:49
You said, "what you want is only possible by using other network drivers than bridge". I'm pretty sure that is another way of saying, "it's impossible". – larsks Aug 16 '22 at 11:45

docker containers with "public" IPs, bridged network

2 Answers2