How to design a carrier grade SIP Server?

Question

There are some SIP Servers which handle few thousand subscribers and some other which can handle millions of subscribers with similar underlying hardware. What are the design and development factors to be considered for implementing SIP Servers which can handle such massive amount of traffic?

score 1 · Accepted Answer · answered Oct 30 '15 at 13:54

First, I'm going to assume you're interested in building your own SIP signaling applications so your question is directed at SIP applications servers and the applications running on top of them. I'm not going to talk about products like Asterisk. There aren't that many choices when it comes to Java app servers that include SIP servlet containers. Basically the big three are IBM's WebSphere server, Oracle's Communications server and Mobicents. I'm mostly familiar with WebSphere which you can download for free at www.wasdev.net but I'm sure all of these products scale on the signaling side very well. Well beyond a couple of thousand endpoints and if you're willing to cluster the servers you could support into the thousands of calls per second fairly easily. This is how tier one providers like AT&T scale their Voip services to massive numbers of endpoints.

If you are including media processing in your question that is when you quickly start getting into scalability issues. Server-side media processing (record/playback, multiway mixing, SFUs, etc.) can be extremely processor intensive. In the SIP servlet world media servers are controlled via the media control API (JSR 309) which delineates the signaling plane form the media plane. So its hard to provide an answer to your question without knowing more about the types of applications you SIP server needs to host.

There are a lot of factors that can affect the scalability of SIP servlet applications that rely on a SIP servlet container. Threading is the key. You want to make sure you are never blocking threads in your application code. Everything must be asynchronous to scale. For developers not used to writing asynchronous code this can take a little getting used to but its critical to figure this out before taking on any real-time signaling development. In terms of Java servers you also want to tune your JVM for the best possible results. This goes beyond making sure you have enough heap space to accommodate the number of calls per second you server must support. There are many JVM Garbage Collection (GC) knobs that can be turned to adjust the nursery size, etc. Its critical that the GC configuration is right for your server. Most JVMs also have specific GC algorithms that are designed to work better for real-time applications. For instance, the JVM used with IBM WebSphere supports a GC algorithm called metronome which trades of GC activity for low latency.

This is an enormous subject so if you can provide some more details about what your trying to accomplish with a SIP server I might be able to provide more insight.

Thank you for the detail explanation. The third paragraph in your answer is what I am more interested in i.e the coding techniques that should be used to implemented a scalable SIP server or other telecom software like LTE packet core(EPC). It would be great if you can suggest books or papers that can guide me on building carrier grade software. — Jay, Nov 02 '15 at 05:01
This book on SIP Servlet 1.1 development looks like it has some good reviews and covers subjects like SIP clustering which are critical for scalability: http://amzn.to/1SlsU7x Hope this helps. — bpulito, Nov 03 '15 at 15:37

score 1 · Answer 2 · answered Oct 30 '15 at 16:12

From a pure standard -rfc3261- point of view, a SIP server handles SIP traffic with the purpose of routing (finding users or other server) and nothing else. I'm assuming here we are talking about a SIP proxy server.

Incoming request are handled on a SIP server either using stateful or stateless mode.

You can get the definitions of both from the rfc3261:

  Stateful Proxy: A logical entity that maintains the client and
     server transaction state machines defined by this specification
     during the processing of a request, also known as a transaction
     stateful proxy.  The behavior of a stateful proxy is further
     defined in Section 16.  A (transaction) stateful proxy is not
     the same as a call stateful proxy.

  Stateless Proxy: A logical entity that does not maintain the
     client or server transaction state machines defined in this
     specification when it processes requests.  A stateless proxy
     forwards every request it receives downstream and every
     response it receives upstream.

An incoming SIP request coming to a stateful proxy server will usually exist for a short duration. This duration will usually be very short. For example, routing a BYE will require to allocate 2 transactions: one incoming and one outgoing. It will exist in memory until both reach "terminated" state as described in Figure 6 and Figure 8 of rfc3261. With TCP, Timer J and Timer K are 0 seconds and thus, the theorical duration is a bit more than the time to receive the answer. With UDP, Timer J is 32 seconds and thus, the allocated transaction context must exist at least 32 seconds after the last answer received (to handle retransmission).

In order to optimize memory and go faster, stateless processing can be used. However, this means that retransmission will require a new computation to find out the exact same result that was found for first processing. Under high loss, or slow traffic, this can increase CPU usage compared to stateful mode. Stateless mode also requires to always process the request exactly the same and is usually used for rejecting unwanted/broken trafic: this may(?) help, for example, to resist to DDOS attack, rejecting message with syntax issue, or rejecting forbidden trafic.

Of course, the next question will be about implementation: you'll need to use a good OS, good threading, nice asynchronous non blocking DNS or socket operation, care about memory usage, allocations, etc...

You reported there exists server handling few thousands subscribers and other millions: in fact, there is no reason a real proxy SIP servers wouldn't handle millions (if well coded, of course). The server handling "only" a few thousands subscribers are usually acting as endpoint (like asterisk): they are not at all SIP server as described in rfc3261, ie: SIP proxy server.

As a sidenote, even true proxy server usually are able to cheat and insert media relay: handling RTP relay. While this is today required to handle media establishement (if you don't have ICE), this introduce a severe limitation in terms of bandwidth: 95%(or whatever?) of traffic will become RTP instead of SIP only and the bandwidth will be the limitation for your server.

For sure, looking at ser related projects (ser, kamailio, opensips...) will demonstrate all described above:

they can handle transactions in stateful mode
they can handle transactions in stateless mode
they can be configured to only do routing
if configured that way they could handle a lot of subscribers
if you activate a few stuff (rtp relay, call control stuff, presence...), the server will not be any more just a proxy only and limitations will anyway appear!

How to design a carrier grade SIP Server?

2 Answers2