4

I'm currently unable to access/ping/connect to any service outside of Google from my private Kubernetes cluster. The pods are running Alpine linux.

Routing Tables

/sleepez/api # ip route show table all
default via 10.52.1.1 dev eth0
10.52.1.0/24 dev eth0 scope link  src 10.52.1.4
broadcast 10.52.1.0 dev eth0 table local scope link  src 10.52.1.4
local 10.52.1.4 dev eth0 table local scope host  src 10.52.1.4
broadcast 10.52.1.255 dev eth0 table local scope link  src 10.52.1.4
broadcast 127.0.0.0 dev lo table local scope link  src 127.0.0.1
local 127.0.0.0/8 dev lo table local scope host  src 127.0.0.1
local 127.0.0.1 dev lo table local scope host  src 127.0.0.1
broadcast 127.255.255.255 dev lo table local scope link  src 127.0.0.1
local ::1 dev lo  metric 0
local fe80::ac29:afff:fea1:9357 dev lo  metric 0
fe80::/64 dev eth0  metric 256
ff00::/8 dev eth0  metric 256
unreachable default dev lo  metric -1  error -101

The pod certainly has an assigned IP and has no problem connecting to it's gateway:

PS C:\...\> kubectl get pods -o wide -n si-dev
NAME                              READY     STATUS    RESTARTS   AGE       IP          NODE
sleep-intel-api-79bf57bd9-c4l8d   1/1       Running   0          52m       10.52.1.4   gke-sez-production-default-pool-74b75ebc-6787

ip addr output

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1460 qdisc noqueue state UP
    link/ether 0a:58:0a:34:01:04 brd ff:ff:ff:ff:ff:ff
    inet 10.52.1.4/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ac29:afff:fea1:9357/64 scope link
       valid_lft forever preferred_lft forever

Pinging Gateway Works

/sleepez/api # ping 10.52.1.1
PING 10.52.1.1 (10.52.1.1): 56 data bytes
64 bytes from 10.52.1.1: seq=0 ttl=64 time=0.111 ms
64 bytes from 10.52.1.1: seq=1 ttl=64 time=0.148 ms
64 bytes from 10.52.1.1: seq=2 ttl=64 time=0.137 ms
^C
--- 10.52.1.1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.111/0.132/0.148 ms

Pinging 1.1.1.1 Fails

/sleepez/api # ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
^C
--- 1.1.1.1 ping statistics ---
6 packets transmitted, 0 packets received, 100% packet loss

System Services Status

PS C:\...\> kubectl get deploy -n kube-system
NAME                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
event-exporter-v0.1.7   1         1         1            1           18m
heapster-v1.4.3         1         1         1            1           18m
kube-dns                2         2         2            2           18m
kube-dns-autoscaler     1         1         1            1           18m
l7-default-backend      1         1         1            1           18m
tiller-deploy           1         1         1            1           14m

Traceroute (Google Internal)

/sleepez/api # traceroute -In 74.125.69.105
 1  10.52.1.1  0.007 ms  0.006 ms  0.006 ms
 2  *  *  *
 3  *  *  *
 4  *  *

Traceroute (External)

traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 46 byte packets
 1  10.52.1.1  0.009 ms  0.003 ms  0.004 ms
 2  *  *  *
 3  *  *  *
 [continues...]
Evan Darwin
  • 143
  • 1
  • 6
  • Including a traceroute from the VM to an IP address outside of Google as well as a traceroute from outside of Google to the external IP address of your VM will make this problem a lot easier to debug. – kasperd May 07 '18 at 20:23
  • @kasperd I included the traceroutes from inside the pod to an internal Google. I'm not sure how the external traceroute will help since it terminates at the Google-managed K8s cluster... – Evan Darwin May 07 '18 at 20:36
  • The other traceroute I was asking for was from an external network to the external IP address of your VM. – kasperd May 07 '18 at 20:44
  • With a private GKE cluster the Compute nodes don't receive an external IP address. I can add one to provide the debugging info, but I still doubt that that is the issue – Evan Darwin May 07 '18 at 20:48

3 Answers3

8

Nodes in a private GKE cluster do not have external IP addresses, so they cannot communicate with sites outside of Google. https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#pulling_a_container_image_from_a_registry

Rohit Agarwal
  • 208
  • 2
  • 6
0

I have two private gke clusters currently accessing the internet now. I thought I achieved this by using the NAT Gateway but adding a third cluster now it is not working for the third cluster. I suspect it's a difference in versions in kubernetes on the clusters.

Without question your other private ip'd servers/nodes can access your private cluster through services and your pods can access the internet through (I think) NAT.

Shawn
  • 1
0

I just finished debugging this in a lab cluster.

Test Driven Development says you should come up with a test first.

1st test of internet access from private GKE node:

  1. In the GCP Console GUI you can select ssh to private node through the browser (Even if the GKE node only has a private IP, and there's no Bastion Host, or NAT/Internet Gateway), it didn't work for me until I ran the following that I derived from some docs.
  2. gcloud compute --project=$PROJECT firewall-rules create ssh-from-browser --direction=INGRESS --priority=500 --network=lab-vpc --action=ALLOW --rules=tcp:22 --source-ranges=35.235.240.0/20
  3. After the above firewall rule I created a new ssh browser session and it worked (the original ssh browser session from before I added the rule continued to fail for some reason.)
  4. Internet test via GKE node based on COS:
    curl ifconfig.me
    (It hung / eventually timed out, ping isn't installed)

2nd test of internet access from GKE pod:

alias k=kubectl
k run -it busybox --image=busybox -- /bin/sh
exit
k exec -it busybox -- ping 8.8.8.8

The ping hangs until use ctrl+c to break out, and you'll see a mention of 100% packet loss (so no internet)



The following doc has a "Requirements, restrictions, and limitations" section that explains that Cloud NAT is needed:
https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#req_res_lim

In a private cluster, nodes only have internal IP addresses, which means that nodes and Pods are isolated from the internet by default.
...
All nodes in a private cluster are created without a public IP; they have limited access to Google APIs and services. To provide outbound internet access for your private nodes, you can use Cloud NAT.



Here's the solution I came up with:
Most of the solution is derived from here https://gist.github.com/mikesparr/9f522b00b4d3c32227b2ae179260c6e4

export NETWORK_NAME="lab-vpc"
export GCP_REGION="us-central1"
export CLOUD_ROUTER_NAME="router-1"
export CLOUD_ROUTER_ASN="64523"
export NAT_GW_NAME="nat-gateway-1"

gcloud compute routers create $CLOUD_ROUTER_NAME \
    --network $NETWORK_NAME \
    --asn $CLOUD_ROUTER_ASN \
    --region $GCP_REGION

gcloud compute routers nats create $NAT_GW_NAME \
 --router=$CLOUD_ROUTER_NAME \
 --region=$GCP_REGION \
 --auto-allocate-nat-external-ips \
 --nat-all-subnet-ip-ranges 


The Two Internet Access Tests will now work
1st Internet test via GKE node (SSH'd through GCP Console GUI):
Private-GKE-Node-Bash ~ $ curl ifconfig.me
(Now lists the WAN IP of the NAT Gateway)

2nd Internet test via GKE pod:

alias k=kubectl
k exec -it busybox -- ping 8.8.8.8

PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=114 time=1.394 ms
(infinite loop until ctrl+c to break out)

Both tests confirm internet after the change

neoakris
  • 133
  • 9