HAProxy URI balancing isn't very balanced

Question

I'm attempting to use HAProxy 1.4.22 with URI balancing and hash-type consistent to load balance between 3 varnish cache backends. My understanding is that this will never accomplish a perfect balance between servers but it should be better than the results I'm seeing.

The relevant part of my HAproxy config looks like:

backend varnish
    # hash balancing
    balance uri
    hash-type consistent

    server varnish1 10.0.0.1:80 check observe layer7 maxconn 5000 id 1 weight 75 
    server varnish2 10.0.0.2:80 check observe layer7 maxconn 5000 id 2 weight 50
    server varnish3 10.0.0.3:80 check observe layer7 maxconn 5000 id 3 weight 50

I've been self-testing by pointing my own hosts file at the new proxy server, and I even tried re-routing the popular homepage to a separate backend that's balanced round-robin to get that outlier off the hash balanced backend, that seems to work fine. I boosted varnish1 to a weight of 75 as a test, but it didn't seem to help. My load is being very disproportionately balanced and I don't understand why this is.

Close Up stats

Full Stats

One interesting tidbit is that if I reverse the IDs, the higher ID will ALWAYS get the lion's share of the traffic. Why would the ID affect balancing?

Tweaking weights is well and good, but as my site's traffic patterns change (we are a news site and the most popular post can change rapidly) I don't want to have to constantly tweak weights. I understand it'll never be in perfect balance, but I was expecting better results than having one server with a lower weight getting 25 times more connections than another server with a higher weight.

My goal has been to reduce DB and app server load by reducing duplication at the cache level which HAproxy URI balancing is recommended for but if it's going to be this out of balance it won't work for me at all.

Any advice?

score 5 · Answer 1 · answered Apr 11 '13 at 18:10

I'm not sure if this is very helpful, but I've struggled a bit with the same problem - and this is what I've concluded;

Hash-based load balancing will, as you've already established, never give you perfect load balancing. The behavior you see can simply be explained by having a few of the most visited / largest pages on the same server - by having few pages that gets a lot of traffic, and a lot of pages that get little traffic, this will be enough to skew the statistics.

Your configuration is to use consistent hashing. The ID's and server weight determine the final server the hashed entry will be directed to - that is why your balancing is affected by this. The documentation is pretty clear that even though this is a good algorithm for balancing caches - it may require you to change around the IDs and increase the total weight of the servers to get a more even distribution.

If you take a large sample of unique addresses (more than 1000), and you visit each of these one time - you should see that the session counter is a lot more equal across the three backends than if you allow 'ordinary' traffic against the balancer as this is affected by the traffic pattern of the site as well.

My advice would be to make sure that you hash the entire URL, not just what's to the left of "?". This is controlled by using balance uri whole in the configuration. Ref. the haproxy documentation. If you have a lot of URL's which have the same base, but with varying GET-parameters - this will definitely give you improved results.

I would also take into consideration how the load balancing affects the capacity of your cache servers. If it doesn't effectively affect redundancy in any way - I wouldn't worry too much about it, as getting perfect load balancing isn't something you are likely to achieve with URI-hashing.

I hope this helps.

The problem with including the query string in the hash is that the utm_* and other variables that don't change the backend output and are stripped by my varnish configs will then affect the hash and cause unnecessary backend traffic. — Pax, Apr 14 '13 at 09:55
I can see how that could be a problem. Stripping those variables in haproxy in stead of in Varnish would help with that - but I don't know if that's possible. — Kvisle, Apr 14 '13 at 14:39
It doesn't seem to be. It'd be nice if HAProxy let you define your own hash like varnish can. — Pax, Apr 15 '13 at 07:37

score 2 · Answer 2 · answered Apr 15 '13 at 07:36

I ended up changing the config as so:

backend varnish
        # hash balancing
        balance uri
        hash-type consistent

        server varnish1 64.106.164.122:80 check observe layer7 maxconn 5000 id 1 weight 75
        server varnish2 64.106.164.121:80 check observe layer7 maxconn 5000 id 715827882 weight 50
        server varnish3 64.106.164.117:80 check observe layer7 maxconn 5000 id 1431655764 weight 38

It turns out that the IDs seem to matter a lot, I have these spaced out now across the range and this seems to help the balancing. I tweaked the weights as well as you can see.

Now getting a result like this: New haproxy stats

The middle server is still underused, but that's as close to balanced as I could get it, and that's fine for my purpose. I'm using HAproxy to do URI hashing so I could add this third varnish server without increasing backend load, and it seems to be working well, I'm seeing a noticeable decrease in backend load with 3 URI balanced varnish servers vs two randomly balanced ones.

The takeaway from this is that the IDs matter a lot and should be spaced out, which I haven't seen clearly stated anywhere else. Once the IDs are spread out, changing the weights helps, but it's still very unpredictable and requires a lot of tweaking and trial and error. Drastically raising a server's weight can cause it's traffic to drop significantly, which is a weird result.

HAProxy URI balancing isn't very balanced

2 Answers2