0

It seems to me that round-robin load-balancing by HAProxy is not always round-robin. Or is it a bug?

I am load-balancing hive-server2 instances. I have 3 instances to serve Hive queries. Client is beeline. When I am running my long tests on this cluster of computers, the instances are restarted a couple of times, and one time, even though all instances on the HAProxy stats page are shown in UP status (and I manually checked that the server is up), only two of the instances serve all queries for a good 35-40 minutes, when a larger load arrives, and from that point, everything is back to normal.

So, what I am seeing on the stats page is that all instances were restarted 30 minutes ago, and while server 2 and 3 served 4-5 client connections each, server 1 did not serve any. Then suddenly, server 1 gets to serve 9 clients in parallel, and the other two servers get 4-5, in parallel, each, and true round-robin balancing is back to normal, afterwards.

Shouldn't HAProxy distribute the sessions evenly across all servers that are up, accessible, and serving less than the maximum allowed sessions when it is configured to do round-robin balancing?

Aron
  • 43
  • 5

1 Answers1

0

By default, HAProxy should send requests to all servers defined int the load-balancing scheme, but you have more factors contributing this, like: timeout connect, timeout client, timeout server, cookie. You can try tweaking those parameters.

Keep in mind that there is a difference if you are load-balancing (Layer 7) http requests or (Layer 4) TCP requests. For example, maybe the HTTP server is returning 400s or 500s responses, but the TCP connection to the server is ok.

Luminance
  • 820
  • 1
  • 10
  • 24