I'm running an AWS Network Load Balancer that targets an HAProxy running (1 pod) inside a Kubernetes cluster (3 worker nodes) which exposes some services running in the cluster. This ingress is configured by using the Kubernetes AWS Load Balancer Controller, which selects target nodes by instance IDs and exposes k8s services via NodePort. The NLB is set with scheme "internal" as everything is routed over private IPs. The "preserve client IP" is disabled which created some asymmetric routing.
what's going wrong / good:
- I can reach everything normally from "inside" (= account local network, only going through route tables, transit gateways, security group)
- I have a 8% connection timeout from "outside" (= other account, behind a network firewall)
- Whatever the resolved IP of the ELB, the timeout rate is the same
- I have no problem going to other services that are not behind that ELB going through the same firewall (which makes me think it's not a firewall issue)
- I use the same kind of NLB setup to target instances that are not Kubernetes (e.g. Elasticsearch)
- When I disable the proxy protocol both on HAProxy and ELB, the error rate drops to 0.8%
- I have no problem with a Classic Loab Balancer (which is the solution I ended up choosing)
I'm more or less happy with the Classic LB solution, but I'm extremely frustrated by not understanding what is happening and I hope someone explain what is going on.