Efficient preforked server design with NBIO like epoll, kqueue using libevent

Question

I am planning on writing a 'comet' server for 'streaming' data to clients. I have enhanced one in the past to take advantage of the multi-core CPUs but now I'm starting from scratch. I am planning to use epoll/kqueue or libevent to power the server.

One of the issues I have been weighting over is what server design to use? I have several options available since I am planning to use a multi-process model to take advantage of all the CPU cores.

Pre-forked multi-process - each process doing it's own accept
Pre-forked multi-process with master - master process accepts and then uses descriptor passing to pass the accepted socket to a process
Pre-forked multi-process with different ports - Each process listens on a different port on the same system. A loadbalancer decides which process gets the next connection based on some load feedback from the individual daemon processes

Design #2 is most complicated. Design #3 is simple but involves additional hardware that I will need irrespective of the design since I'll have this running on several machines and would require a loadbalancer anyway. Design #1 has the thundering herd issue but I guess thundering herd isn't a big deal with 8 processes but it becomes a big deal when clients constantly connect and disconnecting (which should be rare since this is a comet server).

As I see it, #2 is complicated and requires 2 additional system calls due to descriptor passing between the master & slave processes for each accept. Is it better to have this overhead opposed to the thundering herd problem? If I have 8 processes waking up and executing an accept am I potentially going to see 8 accept calls incase I go with Design #1?

What are the pros and cons of my design choices? What would you recommend?

Since there can't be more than one socket listening on a single port, I have a hard time seeing how alternative 1 is possible. If you want clients to connect to a single port, you have to do some multiplexing of some kind, which only leaves alternative 2 and 3. — Some programmer dude, Nov 21 '11 at 09:41
@alk, my application lends itself as a great usecase for asynchronous IO/NBIO. Threads involve too much context switching overhead. — void_ptr, Nov 21 '11 at 17:53
@Joachim checkout UNIX network programming vol 1 2nd Ed for an explanation how #1 will work. You can also search for the thundering herd problem. — void_ptr, Nov 21 '11 at 17:58
@Joachim Pileborg: I assume option 1 is planned to have one listen()'ner in a parent and multiple accept()'ors in the children. — alk, Nov 21 '11 at 18:42
@void_ptr: why thundering herd for #1? Are you running this on something other than Linux? — ninjalj, Nov 21 '11 at 19:32
@ninjalj: I am using Linux. Are there newer ways of avoiding thundering herd on Linux? I assume if I have a bunch of processes waiting on accept, thundering herd is inevitable. — void_ptr, Nov 22 '11 at 04:47
@void_ptr: the thundering herd problem for `accept()` was fixed during Linux 2.2 timeframe (so, last century). Look at http://lxr.free-electrons.com/source/net/ipv4/inet_connection_sock.c?v=3.1#L225 — ninjalj, Nov 22 '11 at 18:35

score 0 · Answer 1 · answered Nov 21 '11 at 19:01

0

If it weren't processes but threads I'd go for option 2. Anyhow for processes this looks expensive to me, so we are to choose between 1 and 3.

I'd prefer 1, if it is possible to somehow estimate the expected load. Could you set an upper limit for the size of the sleeping herd, will say the preforked processes? How fast do you need to be able to accept a new connections?

So if you're going to go the Tom Dunson way, and bring the big herd fast over the Red River down to Kansas you probably need to choose the 3rd way. So as the resources are available anyway ...

answered Nov 21 '11 at 19:01

alk

69,737
10
105
255

I'm not sure how I am supposed to put a lower bound. I want the connections to be accepted as quickly as possible. I understand threading can help me out here because there wont be a delay in descriptor passing between processes but I am not sure how threading is able to make things better especially in an application like 'Comet' that predominantly experiences long pole connections. – void_ptr Nov 22 '11 at 04:46

score 0 · Answer 2 · answered Nov 22 '11 at 00:07

0

If you aim to make a very large-scaled, high-throughput HTTP daemon, none of #1, #2, and #3 is appropriate. You'd better use 1-to-m or m-to-n models with multi-threading if you wanted to get scalability, like the way nginx/lighttp do.

In fact, if you expect the program to handle less than a hundred connection within a second, then #1, #2, and #3 may not make any visible difference.

However, I would go for #2 in case you may scale up your program in the future by switching a process to a thread since it can be easily integrated into 1-to-m, or m-to-n processing models.

answered Nov 22 '11 at 00:07

ddoman

1,051
1
10
12

Can you point me to the architecture for nginx / lighttpd? I'll be interested in understanding how they're using threading along with event based async IO. I think even memcache uses threads + libevent but I'm not sure how threading fits in. The main reason to use multiple processes is to make sure we can utilize all CPU cores. – void_ptr Nov 22 '11 at 04:44
They don't use AIO in default, at least on Linux environments, but multiplex with epoll(). Thus, they are both 1-to-m model: one thread handles accept/recv/send all. – ddoman Nov 22 '11 at 22:28
you're not addressing the issue that I asked. In a multi-core environment if I have a single thread doing the accept, recv, send calls, then I cannot take advantage of the other cores. In my original post 1 process would do all the accept, recv, send all which is same as saying a single thread would do them. – void_ptr Nov 23 '11 at 07:11

Efficient preforked server design with NBIO like epoll, kqueue using libevent

2 Answers2