5

I'm currently writing an HTTP server in C so that I'll learn about C, network programming and HTTP. I've implemented most of the simple stuff, but I'm only handling one connection at a time. Currently, I'm thinking about how to efficiently add multitasking to my project. Here are some of the options I thought about:

  1. Use one thread per connection. Simple but can't handle many connections.
  2. Use non-blocking API calls only and handle everything in one thread. Sounds interesting but using select()s and such excessively is said to be quite slow.
  3. Some other multithreading model, e.g. something complex like lighttpd uses. (Probably) the best solution, but (probably) too difficult to implement.

Any thoughts on this?

ryyst
  • 9,563
  • 18
  • 70
  • 97

5 Answers5

5

There is no single best model for writing multi-tasked network servers. Different platforms have different solutions for high performance (I/O completion ports, epoll, kqueues). Be careful about going for maximum portability: some features are mimicked on other platforms (i.e. select() is available on Windows) and yield very poor performance because they are simply mapped onto some other native model.

Also, there are other models not covered in your list. In particular, the classic UNIX "pre-fork" model.

In all cases, use any form of asynchronous I/O when available. If it isn't, look into non-blocking synchronous I/O. Design your HTTP library around asynchronous streaming of data, but keep the I/O bit out of it. This is much harder than it sounds. It usually implies writing state machines for your protocol interpreter.

That last bit is most important because it will allow you to experiment with different representations. It might even allow you to write a compact core for each platform local, high-performance tools and swap this core from one platform to the other.

André Caron
  • 44,541
  • 12
  • 67
  • 125
  • +1 for the state machines paragraphs. The different alternatives are mainly about how to store state and how to context-switch (thread/process per connection store state implicitly on the stack and context-switch via the OS scheduler, select() stores state explicitly and context-swicthes via select(), libevent stores state explicitly and context-switches implicitly when calling the appropiate callback, ...). – ninjalj Apr 15 '11 at 21:24
  • @ninjalj: state machines are the way to go. They're usually pain to implement, as they look nothing like "synchronous logic" people are used to writing. I've recently been looking into [Clojure](http://clojure.org/) because it has native support for asynchronous state machines. – André Caron Apr 15 '11 at 22:23
4

Yea, do the one that's interesting to you. When you're done with it, if you're not utterly sick of the project, benchmark it, profile it, and try one of the other techniques. Or, even more interesting, abandon the work, take the learnings, and move on to something completely different.

Will Hartung
  • 115,893
  • 19
  • 128
  • 203
2

You could use an event loop as in node.js:

Source code of node (c, c++, javascript)

https://github.com/joyent/node

Ryan Dahl (the creator of node) outlines the reasoning behind the design of node.js, non-blocking io and the event loop as an alternative to multithreading in a webserver.

http://www.yuiblog.com/blog/2010/05/20/video-dahl/

Douglas Crockford discusses the event loop in Scene 6: Loopage (Friday, August 27, 2010)

http://www.yuiblog.com/blog/2010/08/30/yui-theater-douglas-crockford-crockford-on-javascript-scene-6-loopage-52-min/

An index of Douglas Crockford's above talk (if further background information is needed). Doesn't really apply to your question though.

http://yuiblog.com/crockford/

Homer6
  • 15,034
  • 11
  • 61
  • 81
  • 1
    Also, another question may be helpful: http://stackoverflow.com/questions/409087/creating-a-web-server-in-pure-c – Homer6 Apr 15 '11 at 21:36
0

Look at your platforms most efficient socket polling model - epoll (linux), kqueue (freebsd), WSAEventSelect (Windows). Perhaps combine with a thread pool, handle N connections per thread. You could always start with select then replace with a more efficient model once it works.

Erik
  • 88,732
  • 13
  • 198
  • 189
0

A simple solution might be having multiple processes: have one process accept connections, and as soon as the connection is established fork and handle the connection in that child process.

An interesting variant of this technique is used by SER/OpenSER/Kamailio SIP proxy: there's one main process that accepts the connections and multiple child worker processes, connected via pipes. The parent sends the new filedescriptor through the socket. See this book excerpt at 17.4.2. Passing File Descriptors over UNIX Domain Sockets. The OpenSER/Kamailio SIP proxies are used for heavy-duty SIP processing where performance is a huge issue and they do very well with this technique (plus shared memory for information sharing). Multi-threading is probably easier to implement, though.

DarkDust
  • 90,870
  • 19
  • 190
  • 224