0

I'm running CoreOS and trying to get service 1 to talk to service 2. If both service 1 and 2 have launched on the same instance, everything works. However, if service 1 and 2 are scheduled on different instances, no dice. Here's my replication controllers:

$ kubectl get replicationcontrollers
CONTROLLER      CONTAINER(S)   IMAGE(S)       SELECTOR                REPLICAS
app-server      etcd           redacted       k8s-app=app-server      1
kube-dns        kube2sky       redacted       k8s-app=kube-dns        1
                skydns         redacted                                 
static-server   etcd           redacted       k8s-app=static-server   1
web-server      etcd           redacted       k8s-app=web-server      1

Here's how the pods got scheduled:

$ kubectl get pods
POD                   IP           CONTAINER(S)   IMAGE(S)       HOST            LABELS                                                STATUS    CREATED      MESSAGE
app-server-g80uh      172.17.0.9                                 10.10.10.103/   k8s-app=app-server,name=app-server                    Running   11 minutes   
                                   etcd           redacted                                                                             Running   10 minutes   
kube-dns-t2zgb        172.17.0.2                                 10.10.10.102/   k8s-app=kube-dns,kubernetes.io/cluster-service=true   Running   37 minutes   
                                   kube2sky       redacted                                                                             Running   8 seconds    last termination: exit code 2
                                   skydns         redacted                                                                             Running   18 minutes   
static-server-lg4vs   172.17.0.2                                 10.10.10.104/   k8s-app=static-server,name=static-server              Running   11 minutes   
                                   etcd           redacted                                                                             Running   7 minutes    
web-server-wike6      172.17.0.6                                 10.10.10.102/   k8s-app=web-server,name=web-server                    Running   37 minutes   
                                   etcd           redacted                                                                             Running   19 minutes   

As you can see, the web server is on 10.10.10.102 and the upstream app server is on 10.10.10.103. If I curl the app-server's portal IP while SSH'd into the 10.10.10.103 instance, I get a 200 response:

$ curl -I 10.100.1.2:8080
HTTP/1.1 200 OK
Date: Wed, 15 Jul 2015 22:34:45 GMT
Content-Length: 690
Content-Type: text/html; charset=utf-8

Doing the same thing from 10.10.10.102 just hangs.

When I look at the iptables dump, nothing seems amiss, but truthfully, I don't know what to look for.

$ sudo iptables -t nat -L --line-numbers
Chain PREROUTING (policy ACCEPT)
num  target     prot opt source               destination         
1    DOCKER     all  --  anywhere             anywhere             ADDRTYPE match dst-type LOCAL
2    KUBE-PORTALS-CONTAINER  all  --  anywhere             anywhere            

Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    DOCKER     all  --  anywhere            !loopback/8           ADDRTYPE match dst-type LOCAL
2    KUBE-PORTALS-HOST  all  --  anywhere             anywhere            

Chain POSTROUTING (policy ACCEPT)
num  target     prot opt source               destination         
1    MASQUERADE  all  --  172.17.0.0/16        anywhere            
2    FLANNEL    all  --  172.17.0.0/16        anywhere            

Chain DOCKER (2 references)
num  target     prot opt source               destination         

Chain FLANNEL (1 references)
num  target     prot opt source               destination         
1    ACCEPT     all  --  anywhere             172.17.0.0/16       
2    MASQUERADE  all  --  anywhere            !base-address.mcast.net/4 

Chain KUBE-PORTALS-CONTAINER (1 references)
num  target     prot opt source               destination         
1    REDIRECT   tcp  --  anywhere             10.100.0.2           /* default/kubernetes: */ tcp dpt:https redir ports 43919
2    REDIRECT   tcp  --  anywhere             10.100.0.1           /* default/kubernetes-ro: */ tcp dpt:http redir ports 53665
3    REDIRECT   udp  --  anywhere             10.100.0.10          /* default/kube-dns:dns */ udp dpt:domain redir ports 44696
4    REDIRECT   tcp  --  anywhere             10.100.0.10          /* default/kube-dns:dns-tcp */ tcp dpt:domain redir ports 53151
5    REDIRECT   tcp  --  anywhere             10.100.1.2           /* default/app-server: */ tcp dpt:http-alt redir ports 53940
6    REDIRECT   tcp  --  anywhere             10.10.10.102         /* default/app-server: */ tcp dpt:http-alt redir ports 53940
7    REDIRECT   tcp  --  anywhere             10.10.10.103         /* default/app-server: */ tcp dpt:http-alt redir ports 53940
8    REDIRECT   tcp  --  anywhere             10.10.10.104         /* default/app-server: */ tcp dpt:http-alt redir ports 53940
9    REDIRECT   tcp  --  anywhere             10.100.1.1           /* default/web-server: */ tcp dpt:http redir ports 47191
10   REDIRECT   tcp  --  anywhere             10.10.10.102         /* default/web-server: */ tcp dpt:http redir ports 47191
11   REDIRECT   tcp  --  anywhere             10.10.10.103         /* default/web-server: */ tcp dpt:http redir ports 47191
12   REDIRECT   tcp  --  anywhere             10.10.10.104         /* default/web-server: */ tcp dpt:http redir ports 47191
13   REDIRECT   tcp  --  anywhere             10.100.1.3           /* default/static-server: */ tcp dpt:18080 redir ports 39414
14   REDIRECT   tcp  --  anywhere             10.10.10.102         /* default/static-server: */ tcp dpt:18080 redir ports 39414
15   REDIRECT   tcp  --  anywhere             10.10.10.103         /* default/static-server: */ tcp dpt:18080 redir ports 39414
16   REDIRECT   tcp  --  anywhere             10.10.10.104         /* default/static-server: */ tcp dpt:18080 redir ports 39414

Chain KUBE-PORTALS-HOST (1 references)
num  target     prot opt source               destination         
1    DNAT       tcp  --  anywhere             10.100.0.2           /* default/kubernetes: */ tcp dpt:https to:10.0.2.15:43919
2    DNAT       tcp  --  anywhere             10.100.0.1           /* default/kubernetes-ro: */ tcp dpt:http to:10.0.2.15:53665
3    DNAT       udp  --  anywhere             10.100.0.10          /* default/kube-dns:dns */ udp dpt:domain to:10.0.2.15:44696
4    DNAT       tcp  --  anywhere             10.100.0.10          /* default/kube-dns:dns-tcp */ tcp dpt:domain to:10.0.2.15:53151
5    DNAT       tcp  --  anywhere             10.100.1.2           /* default/app-server: */ tcp dpt:http-alt to:10.0.2.15:53940
6    DNAT       tcp  --  anywhere             10.10.10.102         /* default/app-server: */ tcp dpt:http-alt to:10.0.2.15:53940
7    DNAT       tcp  --  anywhere             10.10.10.103         /* default/app-server: */ tcp dpt:http-alt to:10.0.2.15:53940
8    DNAT       tcp  --  anywhere             10.10.10.104         /* default/app-server: */ tcp dpt:http-alt to:10.0.2.15:53940
9    DNAT       tcp  --  anywhere             10.100.1.1           /* default/web-server: */ tcp dpt:http to:10.0.2.15:47191
10   DNAT       tcp  --  anywhere             10.10.10.102         /* default/web-server: */ tcp dpt:http to:10.0.2.15:47191
11   DNAT       tcp  --  anywhere             10.10.10.103         /* default/web-server: */ tcp dpt:http to:10.0.2.15:47191
12   DNAT       tcp  --  anywhere             10.10.10.104         /* default/web-server: */ tcp dpt:http to:10.0.2.15:47191
13   DNAT       tcp  --  anywhere             10.100.1.3           /* default/static-server: */ tcp dpt:18080 to:10.0.2.15:39414
14   DNAT       tcp  --  anywhere             10.10.10.102         /* default/static-server: */ tcp dpt:18080 to:10.0.2.15:39414
15   DNAT       tcp  --  anywhere             10.10.10.103         /* default/static-server: */ tcp dpt:18080 to:10.0.2.15:39414
16   DNAT       tcp  --  anywhere             10.10.10.104         /* default/static-server: */ tcp dpt:18080 to:10.0.2.15:39414

Flannel is running.

$ ps axu | grep flannel
root      1065  0.0  0.0 101200  1780 ?        Ssl  Jul13   0:02 /usr/libexec/sdnotify-proxy /run/flannel/sd.sock /usr/bin/docker run --net=host --privileged=true --rm --volume=/run/flannel:/run/flannel --env=NOTIFY_SOCKET=/run/flannel/sd.sock --env=AWS_ACCESS_KEY_ID= --env=AWS_SECRET_ACCESS_KEY= --env-file=/run/flannel/options.env --volume=/usr/share/ca-certificates:/etc/ssl/certs:ro --volume=/etc/ssl/etcd:/etc/ssl/etcd:ro quay.io/coreos/flannel:0.5.0 /opt/bin/flanneld --ip-masq=true
root      1068  0.0  0.7 154104 15508 ?        Sl   Jul13   0:02 /usr/bin/docker run --net=host --privileged=true --rm --volume=/run/flannel:/run/flannel --env=NOTIFY_SOCKET=/run/flannel/sd.sock --env=AWS_ACCESS_KEY_ID= --env=AWS_SECRET_ACCESS_KEY= --env-file=/run/flannel/options.env --volume=/usr/share/ca-certificates:/etc/ssl/certs:ro --volume=/etc/ssl/etcd:/etc/ssl/etcd:ro quay.io/coreos/flannel:0.5.0 /opt/bin/flanneld --ip-masq=true
root      1137  0.0  0.2 200208  5548 ?        Ssl  Jul13   0:39 /opt/bin/flanneld --ip-masq=true

There be bridges:

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:ea:e5:46 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
       valid_lft 82556sec preferred_lft 82556sec
    inet6 fe80::a00:27ff:feea:e546/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:dc:c6:c8 brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.102/24 brd 10.10.10.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fedc:c6c8/64 scope link 
       valid_lft forever preferred_lft forever
4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc netem state UP group default 
    link/ether 66:2d:b8:5e:0b:d9 brd ff:ff:ff:ff:ff:ff
    inet 172.17.42.1/16 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::bccb:60ff:fe53:1c68/64 scope link 
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 9e:ed:e5:64:6f:0d brd ff:ff:ff:ff:ff:ff
    inet 172.17.21.0/16 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::9ced:e5ff:fe64:6f0d/64 scope link 
       valid_lft forever preferred_lft forever
7: veth3f64f61: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether da:fe:d7:d0:04:89 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::d8fe:d7ff:fed0:489/64 scope link 
       valid_lft forever preferred_lft forever
11: vetha8ba7a3: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 66:2d:b8:5e:0b:d9 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::642d:b8ff:fe5e:bd9/64 scope link 
       valid_lft forever preferred_lft forever
17: veth3dcd221: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 6a:ef:f2:9e:5a:eb brd ff:ff:ff:ff:ff:ff
    inet6 fe80::68ef:f2ff:fe9e:5aeb/64 scope link 
       valid_lft forever preferred_lft forever

Here's the voluminous startup log for flannel (on 10.10.10.102):

$ systemctl status -n 1000 -l flanneld.service
● flanneld.service - Network fabric for containers
   Loaded: loaded (/usr/lib64/systemd/system/flanneld.service; static; vendor preset: disabled)
  Drop-In: /etc/systemd/system/flanneld.service.d
           └─50-network-config.conf
   Active: active (running) since Thu 2015-07-16 02:22:40 UTC; 1h 8min ago
     Docs: https://github.com/coreos/flannel
  Process: 1364 ExecStartPost=/usr/bin/docker run --net=host --rm -v /run:/run quay.io/coreos/flannel:${FLANNEL_VER} /opt/bin/mk-docker-opts.sh -d /run/flannel_docker_opts.env -i (code=exited, status=0/SUCCESS)
  Process: 1237 ExecStartPre=/usr/bin/etcdctl set /coreos.com/network/config {"Network":"172.17.0.0/16", "Backend": {"Type": "vxlan"}} (code=exited, status=0/SUCCESS)
  Process: 1222 ExecStartPre=/usr/bin/touch /run/flannel/options.env (code=exited, status=0/SUCCESS)
  Process: 1216 ExecStartPre=/usr/bin/mkdir -p ${ETCD_SSL_DIR} (code=exited, status=0/SUCCESS)
  Process: 1213 ExecStartPre=/usr/bin/mkdir -p /run/flannel (code=exited, status=0/SUCCESS)
  Process: 1212 ExecStartPre=/sbin/modprobe ip_tables (code=exited, status=0/SUCCESS)
 Main PID: 1246 (sdnotify-proxy)
   Memory: 2.0M
      CPU: 159ms
   CGroup: /system.slice/flanneld.service
           ├─1246 /usr/libexec/sdnotify-proxy /run/flannel/sd.sock /usr/bin/docker run --net=host --privileged=true --rm --volume=/run/flannel:/run/flannel --env=NOTIFY_SOCKET=/run/flannel/sd.sock --env=AWS_ACCESS_KEY_ID= --env=AWS_SECRET_ACCESS_KEY= --env-file=/run/flannel/options.env --volume=/usr/share/ca-certificates:/etc/ssl/certs:ro --volume=/etc/ssl/etcd:/etc/ssl/etcd:ro quay.io/coreos/flannel:0.5.0 /opt/bin/flanneld --ip-masq=true
           └─1249 /usr/bin/docker run --net=host --privileged=true --rm --volume=/run/flannel:/run/flannel --env=NOTIFY_SOCKET=/run/flannel/sd.sock --env=AWS_ACCESS_KEY_ID= --env=AWS_SECRET_ACCESS_KEY= --env-file=/run/flannel/options.env --volume=/usr/share/ca-certificates:/etc/ssl/certs:ro --volume=/etc/ssl/etcd:/etc/ssl/etcd:ro quay.io/coreos/flannel:0.5.0 /opt/bin/flanneld --ip-masq=true

Jul 16 02:22:19 node-01 systemd[1]: Starting Network fabric for containers...
Jul 16 02:22:19 node-01 etcdctl[1237]: {"Network":"172.17.0.0/16", "Backend": {"Type": "vxlan"}}
Jul 16 02:22:19 node-01 sdnotify-proxy[1246]: Unable to find image 'quay.io/coreos/flannel:0.5.0' locally
Jul 16 02:22:21 node-01 sdnotify-proxy[1246]: Pulling repository quay.io/coreos/flannel
Jul 16 02:22:23 node-01 sdnotify-proxy[1246]: 0fbceb3474ee: Pulling image (0.5.0) from quay.io/coreos/flannel
Jul 16 02:22:23 node-01 sdnotify-proxy[1246]: 0fbceb3474ee: Pulling image (0.5.0) from quay.io/coreos/flannel, endpoint: https://quay.io/v1/
Jul 16 02:22:24 node-01 sdnotify-proxy[1246]: 0fbceb3474ee: Pulling dependent layers
Jul 16 02:22:24 node-01 sdnotify-proxy[1246]: 91a6195f52a2: Pulling metadata
Jul 16 02:22:24 node-01 sdnotify-proxy[1246]: 91a6195f52a2: Pulling fs layer
Jul 16 02:22:30 node-01 sdnotify-proxy[1246]: 91a6195f52a2: Download complete
Jul 16 02:22:30 node-01 sdnotify-proxy[1246]: 2b8e51ef6b0f: Pulling metadata
Jul 16 02:22:31 node-01 sdnotify-proxy[1246]: 2b8e51ef6b0f: Pulling fs layer
Jul 16 02:22:32 node-01 sdnotify-proxy[1246]: 2b8e51ef6b0f: Download complete
Jul 16 02:22:32 node-01 sdnotify-proxy[1246]: 1503401e87d3: Pulling metadata
Jul 16 02:22:32 node-01 sdnotify-proxy[1246]: 1503401e87d3: Pulling fs layer
Jul 16 02:22:36 node-01 sdnotify-proxy[1246]: 1503401e87d3: Download complete
Jul 16 02:22:36 node-01 sdnotify-proxy[1246]: a6301219b9d9: Pulling metadata
Jul 16 02:22:36 node-01 sdnotify-proxy[1246]: a6301219b9d9: Pulling fs layer
Jul 16 02:22:38 node-01 sdnotify-proxy[1246]: a6301219b9d9: Download complete
Jul 16 02:22:38 node-01 sdnotify-proxy[1246]: 0fbceb3474ee: Pulling metadata
Jul 16 02:22:38 node-01 sdnotify-proxy[1246]: 0fbceb3474ee: Pulling fs layer
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: 0fbceb3474ee: Download complete
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: 0fbceb3474ee: Download complete
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: Status: Downloaded newer image for quay.io/coreos/flannel:0.5.0
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: I0716 02:22:39.913324 00001 main.go:275] Installing signal handlers
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: I0716 02:22:39.914253 00001 main.go:189] Using 10.10.10.102 as external interface
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: get /coreos.com/network/config [http://127.0.0.1:4001]
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: Connecting to etcd: attempt 1 for keys/coreos.com/network/config?quorum=false&recursive=false&sorted=false
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: send.request.to http://127.0.0.1:4001/v2/keys/coreos.com/network/config?quorum=false&recursive=false&sorted=false | method GET
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: recv.response.from
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: recv.success
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: get /coreos.com/network/config [http://127.0.0.1:4001]
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: Connecting to etcd: attempt 1 for keys/coreos.com/network/config?quorum=false&recursive=false&sorted=false
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: send.request.to http://127.0.0.1:4001/v2/keys/coreos.com/network/config?quorum=false&recursive=false&sorted=false | method GET
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: recv.response.from
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: recv.success
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: get /coreos.com/network/subnets [http://127.0.0.1:4001]
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: Connecting to etcd: attempt 1 for keys/coreos.com/network/subnets?quorum=false&recursive=true&sorted=false
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: send.request.to http://127.0.0.1:4001/v2/keys/coreos.com/network/subnets?quorum=false&recursive=true&sorted=false | method GET
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: recv.response.from
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: recv.success
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: I0716 02:22:39.928878 00001 etcd.go:212] Picking subnet in range 172.17.1.0 ... 172.17.255.0
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: put /coreos.com/network/subnets/172.17.21.0-24, {"PublicIP":"10.10.10.102","BackendType":"vxlan","BackendData":{"VtepMAC":"9e:ed:e5:64:6f:0d"}}, ttl: 86400, [http://127.0.0.1:4001]
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: Connecting to etcd: attempt 1 for keys/coreos.com/network/subnets/172.17.21.0-24?prevExist=false
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: send.request.to http://127.0.0.1:4001/v2/keys/coreos.com/network/subnets/172.17.21.0-24?prevExist=false | method PUT
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: recv.response.from
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: recv.success
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: I0716 02:22:39.940622 00001 etcd.go:91] Subnet lease acquired: 172.17.21.0/24
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: I0716 02:22:39.950849 00001 ipmasq.go:47] Adding iptables rule: FLANNEL -d 172.17.0.0/16 -j ACCEPT
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: I0716 02:22:39.960965 00001 ipmasq.go:47] Adding iptables rule: FLANNEL ! -d 224.0.0.0/4 -j MASQUERADE
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: I0716 02:22:39.973317 00001 ipmasq.go:47] Adding iptables rule: POSTROUTING -s 172.17.0.0/16 -j FLANNEL
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: I0716 02:22:39.977948 00001 vxlan.go:153] Watching for L3 misses
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: I0716 02:22:39.978037 00001 vxlan.go:159] Watching for new subnet leases
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: get /coreos.com/network/subnets [http://127.0.0.1:4001]
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: Connecting to etcd: attempt 1 for keys/coreos.com/network/subnets?quorum=false&recursive=true&sorted=false
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: send.request.to http://127.0.0.1:4001/v2/keys/coreos.com/network/subnets?quorum=false&recursive=true&sorted=false | method GET
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: recv.response.from
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: recv.success
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: I0716 02:22:39.982519 00001 vxlan.go:273] Handling initial subnet events
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: I0716 02:22:39.982629 00001 device.go:159] calling GetL2List() dev.link.Index: 5
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: I0716 02:22:39.983927 00001 device.go:164] calling NeighAdd: 10.10.10.101, 3e:0b:84:a6:f4:68
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: I0716 02:22:39.984329 00001 device.go:164] calling NeighAdd: 10.10.10.102, 9e:ed:e5:64:6f:0d
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: rawWatch /coreos.com/network/subnets []
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: get /coreos.com/network/subnets [http://127.0.0.1:4001]
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: Connecting to etcd: attempt 1 for keys/coreos.com/network/subnets?recursive=true&wait=true&waitIndex=124
Jul 16 02:22:39 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:22:39 DEBUG: send.request.to http://127.0.0.1:4001/v2/keys/coreos.com/network/subnets?recursive=true&wait=true&waitIndex=124 | method GET
Jul 16 02:22:40 node-01 systemd[1]: Started Network fabric for containers.
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:25:44 DEBUG: recv.response.from
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:25:44 DEBUG: recv.success
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:25:44 DEBUG: rawWatch /coreos.com/network/subnets []
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:25:44 DEBUG: get /coreos.com/network/subnets [http://127.0.0.1:4001]
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:25:44 DEBUG: Connecting to etcd: attempt 1 for keys/coreos.com/network/subnets?recursive=true&wait=true&waitIndex=124
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:25:44 DEBUG: send.request.to http://127.0.0.1:4001/v2/keys/coreos.com/network/subnets?recursive=true&wait=true&waitIndex=124 | method GET
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: I0716 02:25:44.847546 00001 vxlan.go:232] Subnet added: 172.17.23.0/24
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: I0716 02:25:44.847593 00001 device.go:164] calling NeighAdd: 10.10.10.103, 6e:dc:8d:d3:fb:76
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:25:44 DEBUG: rawWatch /coreos.com/network/subnets []
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:25:44 DEBUG: get /coreos.com/network/subnets [http://127.0.0.1:4001]
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:25:44 DEBUG: Connecting to etcd: attempt 1 for keys/coreos.com/network/subnets?recursive=true&wait=true&waitIndex=256
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:25:44 DEBUG: send.request.to http://127.0.0.1:4001/v2/keys/coreos.com/network/subnets?recursive=true&wait=true&waitIndex=256 | method GET
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:25:44 DEBUG: recv.response.from
Jul 16 02:25:44 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:25:44 DEBUG: recv.success
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:29:15 DEBUG: recv.response.from
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:29:15 DEBUG: recv.success
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:29:15 DEBUG: rawWatch /coreos.com/network/subnets []
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:29:15 DEBUG: get /coreos.com/network/subnets [http://127.0.0.1:4001]
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:29:15 DEBUG: Connecting to etcd: attempt 1 for keys/coreos.com/network/subnets?recursive=true&wait=true&waitIndex=256
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:29:15 DEBUG: send.request.to http://127.0.0.1:4001/v2/keys/coreos.com/network/subnets?recursive=true&wait=true&waitIndex=256 | method GET
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: I0716 02:29:15.960902 00001 vxlan.go:232] Subnet added: 172.17.71.0/24
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: I0716 02:29:15.960974 00001 device.go:164] calling NeighAdd: 10.10.10.104, 0a:5f:c2:00:27:c4
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:29:15 DEBUG: rawWatch /coreos.com/network/subnets []
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:29:15 DEBUG: get /coreos.com/network/subnets [http://127.0.0.1:4001]
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:29:15 DEBUG: Connecting to etcd: attempt 1 for keys/coreos.com/network/subnets?recursive=true&wait=true&waitIndex=434
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:29:15 DEBUG: send.request.to http://127.0.0.1:4001/v2/keys/coreos.com/network/subnets?recursive=true&wait=true&waitIndex=434 | method GET
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:29:15 DEBUG: recv.response.from
Jul 16 02:29:15 node-01 sdnotify-proxy[1246]: go-etcd2015/07/16 02:29:15 DEBUG: recv.success
Bob Aman
  • 32,839
  • 9
  • 71
  • 95
  • can you add your cloud-config to the question? i see you are running version 0.5.0 of flannel, what versions of docker, coreos and kubernetes? – Greg Jul 17 '15 at 12:02

1 Answers1

0

Kubernetes expects that you can contact the ip of a pod from any host (or pod) in your Kubernetes cluster. From host 10.10.10.103, can you access the ip of the web server container directly?

curl http://172.17.0.5

Based on the ip addresses I see, I bet not, because that looks like a standard Docker ip address. To get Kubernetes to work across multiple hosts, you either need to set up an overlay network using something like Flannel, or you would need to accomplish the same thing with a dedicated network connecting your hosts (and bridge your Docker containers on each host to that network).

Flannel also sets up routing rules that need to be correct on each system. If you're doing stuff manually, stop, configure flannel correctly, and restart it on all of your systems. Flannel stores configuration in etcd.

If you've already done that, consider adding the output of ip route to your question.

larsks
  • 277,717
  • 41
  • 399
  • 399
  • There's a `FLANNEL` chain in that `iptables` dump, though that doesn't necessarily mean I've got flannel configured correctly. :-) That said, on 10.10.10.103 I get a timeout, while on 10.10.10.102 I get "no route to host" for 172.17.0.5. – Bob Aman Jul 16 '15 at 02:00
  • If flannel is correctly configured, you would expect to see (a) flannel running, (b) a flannel bridge, and (c) that docker bridge (`docker0`) would have an ip address that was a subset of the flannel network. Based on what I see in your `iptables` dump, your flannel network is using `10.244.0.0/16`, which suggests that something still isn't configured correctly (because the docker container ips are not subsets of this network). – larsks Jul 16 '15 at 02:04
  • OK, looks like (c) might be my problem. Not a subset. Edit: You beat me to it w/ your edit. – Bob Aman Jul 16 '15 at 02:09
  • OK, configured everything to use `172.17.0.0/16` instead, but still no dice. Flannel seems to be reporting `inet 172.17.71.0/16` on node 1, `inet 172.17.23.0/16` on node 2, while `docker0` is using `inet 172.17.42.1/16`. Still getting no route to host for the pod IPs. – Bob Aman Jul 16 '15 at 02:54