The documentation here describes how to setup a http(s) utilisation based load balancer with kubernetes on google cloud platform.
The question is how it actually manages to do utilisation based load balancing. For example, with the following configuration:
- 10 node instance group
- 3 pod replication controller deployed to that instance group
- a NodePort service that exposes port X on every node in the instance group.
Assuming the LB will choose the least utilised of the 10 nodes, and route to it on port X, how is a pod chosen to service the request? Does the kubernetes service then select the pod based on some other balancing algorithm?
Clearly something interesting is happening, because most instances will not have a pod running (and therefore might be more likely to be least utilised).