k8s loadbalancer service with externalTrafficPolicy=local passes through client ip on IPv4, hides it on IPv6

Question

I'm having trouble getting a kubernetes IPv6 SingleStack LoadBalancer service passing through the correct source IP address to pods. It works fine on a sister IPv4 SingleStack LoadBalancer that passes traffic to the same pods.

The cluster is a bare-metal v1.21.1 dual-stack cluster created with kubeadm and uses Calico v3.18 as the cni and MetalLB to allocate loadbalancer IPs to services configured with type: LoadBalancer. Calico is then configured to announce the loadbalancer IPs to the local router over BGP. Taking a toy example of a single nginx deployment with two services (one for IPv4, one for IPv6), if I curl the IP via the IPv4 address, the nginx access log prints the correct client IP in 192.168.2.0/24:

192.168.2.128 - - [01/Jun/2021:19:32:37 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.64.1" "-"

But curling the IPv6 address from the same client in 2001:8b0:c8f:e8b0::/64, nginx shows a client IP address of fd5a:1111:1111::f31f

fd5a:1111:1111::f31f - - [01/Jun/2021:19:34:23 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.64.1" "-"

This address is from the cluster's serviceSubnet of fd5a:1111:1111::/112 and happens to be the clusterIP address of the IPv6 service. It seems like that something is actually doing some TCP proxying here (ipvs?), but it's not clear why it's behaving this way. I'd expect this if externalTrafficPolicy were Cluster - in fact, if I change the services from Local to Cluster, I get the local IP address of the cluster node forwarding the request on IPv4 (as expected), and the same clusterIP address on IPv6. externalTrafficPolicy appears to have no effect in the IPv6 case.

Am I missing something obvious, or should these services behave in the same way as each other?

Manifest of the test:

---
apiVersion: v1
kind: Service
metadata:
  name: test-service-source-ip-v4
  namespace: default
  labels:
    k8s-app: test-service-source-ip
spec:
  selector:
    k8s-app: test-service-source-ip
  type: LoadBalancer
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack
  loadBalancerIP: 192.168.254.11
  externalTrafficPolicy: "Local"
  ports:
    - name: http-tcp
      protocol: TCP
      port: 80
---
apiVersion: v1
kind: Service
metadata:
  name: test-service-source-ip-v6
  namespace: default
  labels:
    k8s-app: test-service-source-ip
spec:
  selector:
    k8s-app: test-service-source-ip
  type: LoadBalancer
  ipFamilies:
    - IPv6
  ipFamilyPolicy: SingleStack
  loadBalancerIP: 2001:8b0:c8f:e8b1:beef:f00d::11
  externalTrafficPolicy: "Local"
  ports:
    - name: http-tcp
      protocol: TCP
      port: 80

---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: default
  name: test-service-source-ip
  labels:
    k8s-app: test-service-source-ip
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: test-service-source-ip
  template:
    metadata:
      labels:
        k8s-app: test-service-source-ip
    spec:
      containers:
        - name: test-service-source-ip
          image: nginx:1
          ports:
            - containerPort: 80
              protocol: TCP

You seem to be misusing ULA addressing because `fd5a:1111:1111::f31f` does not seem to be using a required, _random_ 40-bit Global ID. Also, using network sizes other than `/64` will cause problems with IPv6. If you use a random 40-bit Global ID, you get a `/48` prefix from which you can derive up to 65,536 `/64` networks. You should use IPv6 correctly before trying something like this. — Ron Maupin, Jun 01 '21 at 20:19
The [kubeadm documentation](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/dual-stack-support/#create-a-dual-stack-cluster) explicitly shows using a `/112` as an IPv6 service cidr, so it would be surprising if that was a faulty configuration. If you try and set `serviceSubnet` to larger than a `/108`, kubeadm will give you a fatal error: "specified service subnet is too large; for 128-bit addresses, the mask must be >= 108". — growse, Jun 01 '21 at 21:13
A `/112` network is larger than a `/108` network, so I'm not sure what you mean. [This answer](https://networkengineering.stackexchange.com/a/34172/8499) explains some of the problems you can run into with IPv6 network sizes other than `/64 (exceptions are `/127` for point-to-point links, and `/128` for loopbacks). [This answer](https://networkengineering.stackexchange.com/a/57933/8499) explains the ULA requirement of a random Global ID. Both answers reference the relevant RFCs. — Ron Maupin, Jun 01 '21 at 21:22
A /112 subnet has fewer addresses in it than a /108 subnet (65,536 vs 1m). It is smaller. I don't believe you can currently have a k8s `serviceSubnet` be a `/64`, regardless of what the RFCs might say. As for the ULA requirement, I changed the subnet to use a more random global ID, and it still doesn't work, so I'm not really sure that's the problem. — growse, Jun 01 '21 at 21:25
Right, the host portion is smaller, but the network portion is larger. You said the network is smaller, but the network is four bits larger, and the hosts is four bits smaller. In any case, you should read the relevant RFCs. You should be completely familiar with how IPv6 works prior to implementing in a business network. — Ron Maupin, Jun 01 '21 at 21:29
Ah, I see what you mean about network size. A terminology thing I guess. In any case, this isn't a "business network" and I'm not sure that "read the RFCs" is particularly helpful. But thank you for your insight. — growse, Jun 01 '21 at 21:36
If this is not a business network, then you are asking on the wrong SE site. You should ask on [su] because, "_Server Fault is for questions about **managing information technology systems in a business environment.**_" — Ron Maupin, Jun 01 '21 at 21:45

score 1 · Accepted Answer · answered Jun 18 '21 at 16:35

It turns out that I had an old installation of ip-masq-agent running, which was configured to erroneously do natting of IPv6 traffic both in and out of the cluster. I figured this out by looking at the ip6tables rules and seeing a bunch of MASQUERADE rules that had been populated by ip-masq-agent.

Removing this deployment from the cluster and rebooting the nodes to remove the ip6tables rules solved the problem.

k8s loadbalancer service with externalTrafficPolicy=local passes through client ip on IPv4, hides it on IPv6

1 Answers1