2

I have a small Kubernetes on prem cluster (Rancher 2.3.6) consisting of three nodes. The deployments inside the cluster are provisioned dynamically by an external application and always have their replica count set to 1, because these are stateful applications and high availability is not needed.

The applications are exposed to the internet by NodePort services with a random port and ExternalTrafficPolicy set to Cluster. So if the user requests one of the three nodes, the k8s proxy will route and s-NAT the request to the correct node with the application pod.

To this point, everything works fine.

The problem started when we added Applications that rely on the requests source IP. Since the s-NAT replaces the request IP with an internal IP this applications don't work properly.

I know, that setting the services ExternalTrafficPolicy to local will disabke s-natting. But this will also break the architecture, because not every pod has an instance of the application running.

Is there a way to preserve the original client IP and still make use of the internal routing, so i won't have to worry about on which node the request will land?

frinsch
  • 21
  • 1
  • 2

3 Answers3

3

It depends on how the traffic gets into your cluster. But let's break it down a little bit:

Generally, there are two strategies on how to handle source ip preservation:

  • SNAT (packet IP)
  • proxy/header (passing the original IP in an additional header)

1) SNAT

By default packets to NodePort and LoadBalancer are SourceNAT'd (to the node's IP that received the request), while packets send to ClusterIP are not SourceNAT'd.

As you metioned already, there is a way to turn off SNAT for NodePort and LoadBalancer Services by setting service.spec.externalTrafficPolicy: Local which preserves the original source IP address, but with the undesired effect that kube-proxy only proxies proxy requests to local endpoints, and does not forward traffic to other nodes.

2) Header + Proxy IP preservation

a) Nginx Ingress Controller and L7 LoadBalancer

  • When using L7 LoadBalancers which are sending a X-Forwarded-For header, Nginx by default evaluates the header containing the source ip if we have set the LB CIDR/Address in the proxy-real-ip-cidr
  • you might need to set use-forwarded-headers explicitly to make nginx forward the header information
  • additionally you might want to enable enable-real-ip so the realip_module replaces the real ip that has ben set in the X-Forwarded-For header provided by the trusted LB specified in proxy-real-ip-cidr.

b) Proxy Protocol and L4 LoadBalancer

  • With enabled use-proxy-protocol: "true" the header is not evaulated and the connection details will be send before forwarding the TCP actual connection. The LBs must support this.
Martin Peter
  • 3,565
  • 2
  • 23
  • 26
0

Web applications can be exposed to outside of cluster using Ingress instead of NodePorts. Ingress object can point to app Deployment using a Service in between, where you can configure service.spec.externalTrafficPolicy: Local to preserve the source IP address. You can have an external load balancer pointing to the nodes of the cluster where ingress controller pods are for traffic routing purposes.

Reference: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/

leodotcloud
  • 1,830
  • 14
  • 15
  • Thanks for your reply. The default nginx ingress is not an option because the applications are using udp and tcp connections for different layer 7 protocols. As the the kubernetes article says, external load balancers are only available in cloud environments like Amazon or Google. Since we have a local cluster in our own Datacenter this is not really an option – frinsch Apr 15 '20 at 08:55
0

You can setup MetalLB LoadBalancer in layer2 mode, and use ExternalTrafficPolicy: Local.

In metalLB docs you can read:

When announcing in layer2 mode, one node in your cluster will attract traffic for the service IP. From there, the behavior depends on the selected traffic policy.

In this mode only Nodes which have Service Endpoints (Pod replicas > 0) will serve the incoming traffic; of course in this mode client SourceIPs are preserved.

Matt
  • 7,419
  • 1
  • 11
  • 22