2

I have a Docker swarm running Docker version 1.13.1. I am regularly deploying stacks of Docker services (via docker stack deploy) to this swarm, and I have one nginx proxy service that sits at ports 80 and 443 acting as a reverse proxy to various applications in the swarm.

I ran into a problem with using nginx's upstream capability was that it cached the DNS lookup of my service names. This worked fine for a while but as more stacks were removed and deployed those cached IP addresses became stale and nginx would start timing out or serving requests to the wrong container.

I attempted to fix this using the following technique:

[in nginx.conf]
server {
  server_name myapp.domain.com;
  resolver 127.0.0.11 valid=10s ipv6=off;

  set $myapp http://stack_myapp:80; # stack_myapp is the DNS name of the service
  location / {
    proxy_pass $myapp;
  }
}

# other similar server blocks

127.0.0.11 appears to be the IP address of the internal DNS server the swarm sets up. This seems to work most of the time - the IP addresses of the upstream services do not get cached for long and the proxy recovers if upstream services move around. However, the proxy will occasionally still serve requests to incorrect addresses, for example, it will serve requests to http://10.0.0.12:80/... and time out or hit the wrong container. When I run docker exec proxycontainer ping stack_myapp, I get the correct IP address. Why is nginx not resolving the correct IP when ping does?

Joshua Barron
  • 1,532
  • 2
  • 26
  • 42
  • If you do an `nslookup stack_myapp` from inside the container, are you getting 10.0.0.12? That should be the VIP for the service, which docker will internally proxy through the mesh network to a container assigned to that service. You'll see this IP in a `docker service inspect stack_myapp` – BMitch Feb 24 '17 at 20:21
  • Running `nslookup` yields two addresses, one being 10.0.0.12 and the other being the IP of the service (at least the one I get when I run `hostname -I` on that container). So am I running into an error in the mesh routing? It seems like the network is sending the request to an incorrect container. – Joshua Barron Feb 24 '17 at 21:43
  • Actually, I just noticed that only one of the IPs is correct. One of the IPs correctly resolves to the mesh address of target container (10.0.0.12), but the other IP listed by `nslookup`, 10.0.0.3, is the mesh address of a completely different container (someotherstack_container). Is the DNS lookup stale? The 10.0.0.3 *used* to be the mesh address of the correct container, but after a stack deploy things got shuffled around. – Joshua Barron Feb 24 '17 at 21:56

0 Answers0