I have a situation where I create a Node.js cluster using PM2. A single request fired at a worker would take considerable time (2+ mins) as it's doing intensive computations (in a pipeline of steps) with a couple of I/O operations at different stages (step 1 is 'download over HTTP', an intermediate and last step are 'write to disk'). The client that sends requests to the cluster throttle the requests it sends by two factors:
- Frequency (how many requests per second), we use a slow pace (1 per second)
- How many open requests it can make, we make this less than or equal to the number nodes we have in the cluster
For example, if the cluster is 10 nodes, then the client will only send 10 requests to the cluster at a speed of 1 per second, and won't send any more requests until one or more requests returns with either success or failure, which means that a worker or more should be free now to do more work, then the client will send more work to the cluster.
While watching the load on the server, it seems that the load balancer does not distribute work evenly as one would expect from a classic round-robin distribution schema. What happens is that a single worker (usually the 1st one) will receive a lot of requests while there're free workers in the cluster. This eventually will cash the worker to malfunction.
We implemented a mechanism to prevent a worker from proceeding with new requests if it's still working on a previous one. This prevented malfunctioning, but still, a lot of requests are denied service although the cluster has vacant workers!
Can you think of the reason why this behavior is happening, or how can improve the way PM2 does work?