One of my micro service is running on Kubernetes. I would like to specify to K8s load balancer when a pod is busy because the behaviour that I get currently is not ok.
One example:
I have 8 pods running, each pod can process 1 request at a time. Each request take from 70 to 100% of the CPU core allocated to the pod. But when I send 8 requests to my application, Kubernetes does not dispatch those requests to the 8 pods but try to use only one. And since I'm blocking (via threadpool) each replica of app to use only one thread at a time, of course requests are queued for pod 1.
So my question is: How can I tell Kubernetes that POD 1 is busy and that load-balancer must dispatch request 2 to POD 2 ?
Note: For dev and test purpose I'm using Docker Desktop (Docker for Windows) on Windows 10 and kubectl.