What is your recommendation for a software load balancer or load sharer for the given case?

Question

I have provisioned a server with 8 cores and plan on deploying a network service. For spreading out request load I'd like to run 8 instances of my service. Nothing shocking here. I do not have access to a hardware load balancer. I should mention that I currently have allocated 5 public IP addresses (but I can get more).

Thus, I would like to hear your recommendations for structuring a software load balancing solution.

The obvious choices would be to either:

use HAProxy; or
pre-fork my application (like Facebook Tornado and Unicorn both do);
insert your idea here.

My goals are to:

spread request load between the service instances; and
allow for rolling restarts of my service (code upgrades).

I should mention that this is not a HTTP-based service so NGiNX and the like is out.

I do not love HAProxy because of its memory requirements; it seems to require a read and write buffer per client connection. Thus, I would have buffers at the kernel level, in HAProxy, and in my application. This is getting silly! Perhaps I'm missing something in this regard though?

Thanks!

Willy Tarreau · Answer 1 · 2010-01-23T17:51:14.627

whatever the solution, if you install a process to forward stream data, it will require per-connection buffers. This is because you can't always send all what your received, so you have to keep the excess in a buffer. That said, the memory usage will depend on the number of concurrent connections. One large site is happily running haproxy with default settings at 150000 concurrent connections (4 GB RAM). If you need more than that, version 1.4 lets you adjust the buffer size without recompiling. However, keep in mind that the per-socket kernel buffers will never go below 4kB per direction and per socket, so 16 kB at least per connection. That means that it's pointless to make haproxy run with less than 8 kB per buffer, as it will already consume less than the kernel.

Also, if your service is pure TCP and a proxy has no added value, take a look at network-based solutions such as LVS. It's a lot cheaper as it processes packets and does not need to maintain buffers, so socket buffers will drop packets when they are full, and it can be installed on the same machine as the service.

Edit: Javier, preforked processes relying on the OS to do the load balancing do not sc ale that well at all. The OS wakes every process up when it gets a connection, only one of them gets it and all others go to sleep again. Haproxy in multi-pro cess mode shows its best performance around 4 processes. At 8 processes, perform ance already starts to drop. Apache uses a nice trick against this, it does a lo ck around the accept() so that only one process is waiting for the accept. But t hat kills the load-balancing feature of the OS and stops scaling between 1000 an d 2000 processes. It should use an array of a few locks so that a few processes wake up, but it does not do that.

the beauty of a well-programmed prefork solution is that all your processes/threads are waiting on the same TCP connection, so the OS's TCP stack is your load balancer. Unfortunately, this is only possible if your server is programmed like that; if you simply run n instances and a load balancer, the balancer has to be _really_ well written to swiftly pass all the traffic. — Javier, Jan 22 '10 at 21:12
Thanks Willy. I had not considered LVS for this (clearly I'm new to LB and such). — z8000, Jan 22 '10 at 21:53
@willy(about your edit): didn't know that. sounds like another reason to dynamically start and delete processes. after all, what's the point of having 2000 processes if they're all waiting on the net? you should have just enough processes to be (reasonable) sure that one is waiting for the next packet. in a heavily loaded server, most processes should be working, not waiting. — Javier, Jan 24 '10 at 11:28
I am a little confused. Why would you ever have 2000 processes Ina setup like this? Isn't that obviously overkill? — z8000, Jan 25 '10 at 01:54
@willy (edit): are you describing this? http://en.wikipedia.org/wiki/Thundering_herd_problem — z8000, Jan 25 '10 at 05:30

score 1 · Answer 2 · answered Jan 22 '10 at 21:08

1

without any details on your service it's very hard to say; but in general i'd lean to preforking. It's a tried and true server strategy (and not a newfangled trick like some people think after reading the tornado/unicorn fansites).

Beyond that, a few tips:

each preforked process can use modern non-select strategies (libevent, mostly) to handle huge amounts of clients.
it's very rare that a 1:1 relationship between cores and processes gives optimal performance; it's usually far better do some dynamic adaptability to load.

answered Jan 22 '10 at 21:08

Javier

9,268
2
24
24

I like the pre-forking concept for this. FWIW, Tornado's version of such: http://github.com/facebook/tornado/commit/6fb90ae694190fcedc48d9fb98b02325826d783e – z8000 Jan 22 '10 at 21:51
Re: dynamic adaptability... Can you elaborate on this or point me to something I can learn from? Intuitively I don't see a problem with having more than enough processes ready to handle requests, but I'm sure that's not the whole story. – z8000 Jan 22 '10 at 21:52
just be able to fork some extra processes if you're getting too much clients on each, and kill a few when the load gets lower. – Javier Jan 22 '10 at 22:38
Straightforward enough :) – z8000 Jan 24 '10 at 05:26

What is your recommendation for a software load balancer or load sharer for the given case?

2 Answers2