I'm trying to dive into K8s networking model and I think I have a pretty good understanding of it so far, but there is one thing that I can't get my head around. In the Cluster Networking guide, the following is mentioned:
Kubernetes imposes the following fundamental requirements on any networking implementation (barring any intentional network segmentation policies):
- all containers can communicate with all other containers without NAT
- all nodes can communicate with all containers (and vice-versa) without NAT
- the IP that a container sees itself as is the same IP that others see it as
The second bullet point specifies that x-node container communication should be possible without NAT. This is however not true when kube-proxy runs in iptables
mode. This is the dump of the iptables from one of my nodes:
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
KUBE-POSTROUTING all -- anywhere anywhere /* kubernetes postrouting rules */
Chain KUBE-POSTROUTING (1 references)
target prot opt source destination
MASQUERADE all -- anywhere anywhere /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000
/* sample target pod chain being marked for MASQ */
Chain KUBE-SEP-2BKJZA32HM354D5U (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- xx.yyy.zzz.109 anywhere /* kube-system/heapster: */
DNAT tcp -- anywhere anywhere /* kube-system/heapster: */ tcp to:xx.yyy.zzz.109:8082
Chain KUBE-MARK-MASQ (156 references)
target prot opt source destination
MARK all -- anywhere anywhere MARK or 0x4000
Looks like K8s is changing the source IP of marked outbound packets to the node's IP (for a ClusterIP service). And they even explicitly mention this in Source IP for Services with Type=ClusterIP:
Packets sent to ClusterIP from within the cluster are never source NAT’d if you’re running kube-proxy in iptables mode, which is the default since Kubernetes 1.2. If the client pod and server pod are in the same node, the client_address is the client pod’s IP address. However, if the client pod and server pod are in different nodes, the client_address is the client pod’s node flannel IP address.
This starts by saying packets within the cluster are never SNAT'd but then proceedes to say packages sent to pods in other nodes are in fact SNAT'd. I'm confused about this - am I misinterpreting the all nodes can communicate with all containers (and vice-versa) without NAT requirement somehow?