Kubernetes: MetalLB not load balancing the requests properly

Question

I've setup a kubernetes cluster with LoadBalancer type service using MetalLB and was able to get the external IP working for it. The only issue is that when I access the LB service IP on port 80 from master node, only the pods running on master node respond, that too only when the LB balances/maps to the ip of a pod running on the master node. Whereas, when it balances/maps to the ip of one of the pods running on worker node, it simply times out.

The same case happens when I access the LB service IP on port 80 from the worker node. It only returns the results when it maps to the IP of a pod on the same worker node and times out when it maps to the pod IP running on the master node.

Below are the details about my cluster:

Pod network CIDR: 10.0.1.0/24

Host system's local CIDR: 192.168.2.0/24

Master system IP: 192.168.2.28

Worker node IP: 192.168.2.32

LB IP range assigned to MetalLB: 192.168.2.89-192.168.2.95 (range has been assigned to master)

Service running on pods: nginx (on port 80)

Pod-1 (on master) -  IP: 10.0.1.3:80 (nginx)

Pod-2 (on worker) -  IP: 10.0.1.7:80 (nginx)

MetalLB Service IP: 192.168.2.89

CNI used: Calico

Now, when I do:

  (On Master) # curl -sL 'http://192.168.2.89'

I get the response only if "192.168.2.89" is mapping to "10.0.1.3". When I execute the above command again, LB maps to "10.0.1.7" which is on worker node, it simply times out. The same is the case when I perform the above task on the worker node.

Note that I have modified the index file of each of the pods to better identify which pod is returning the results.

I've also tried accessing the LB IP on port 80 from a machine in the same network but not a part of the kubernetes cluster. From this machine however, I only receive results from the pods running on the master node while it times out when the LB maps to the pod on the worker node.

On Master:

  # curl -sL 'http://192.168.2.89'

Output> Only shows data of the index file which is hosted on the pods running on the Master node.

On Worker:

  # curl -sL 'http://192.168.2.89'

Output> Only shows data of the index file which is hosted on the pods running on the Worker node.

On Client Machine:

  # curl -sL 'http://192.168.2.89'

Output> Only shows data of the index file which is hosted on the pods running on the Master node.

Below is what I found from the test using network traffic logging:

In the cases when the request fails, the LB IP directly forwards the request to the cluster IP of the node and not the public IP of the node which is not reachable as it is only valid inside the cluster and not outside the cluster.

LB IP: 192.168.2.89

Client machine IP: 192.168.2.34

Pod on Master: 10.0.1.7:80

Pod on Worker: 10.0.1.3:80

Cluster IP of the worker node/Unknown: 10.0.1.81

[Successful Transfer from Client Machine]: # curl -sL 'http://192.168.2.89'

192.168.2.34:35162 --> 192.168.2.89:80

<public_ip_master>:54113 --> 10.0.1.7:80

10.0.1.7:80 --> <public_ip_master>:54113

192.168.2.89:80 --> 192.168.2.34:35162

[Failed Transfer from Client Machine]: # curl -sL 'http://192.168.2.89'

192.168.2.34:42114 --> 192.168.2.89:80

10.0.1.81:58946 --> 10.0.1.3:80

192.168.2.34:42114 --> 192.168.2.89:80

192.168.2.34:42114 --> 192.168.2.89:80

Query: I can confirm that there are no firewalls blocking access on either of the systems. What I don’t understand is that why isn’t the load balancer mapping to the public IP of the worker node and only doing so for the master node?

From my investigation, I found that the LB only maps to the public IP of the node when it is balancing requests to the master node. But, when it tries to balance requests to the worker node, it directly tries to reach the worker node’s cluster IP in 10.0… range but not the public IP of the worker node.

Is this behaviour normal? Can I tweak it?

score 0 · Answer 1 · answered Aug 02 '22 at 18:30

In my case using Rancher k3s on Oracle cloud

The nodes need to be able to reach other nodes over UDP port 8472 when Flannel VXLAN is used or over UDP ports 51820 and 51821 (when using IPv6) when Flannel Wireguard backend is used. The node should not listen on any other port.

Because default Oracle cloud block udp port 8472 but Rancher k3s required open udp port 8472 to nodes can reach other.

I don't know the context of you but I think it can help

Kubernetes: MetalLB not load balancing the requests properly

1 Answers1