How to maintain servers without disconnecting clients?

Question

I am designing/developing new messenger service (just like Yahoo, Gtalk messengers). So, my clients will be continuously connected to the servers. I like to know how such servers can be maintained (applying fixes, upgrades etc) while clients are still connected?

One traditional way is to disconnect clients and bring down servers, resulting in downtime. However, this is not ideal solution because fixes, enhancements could be frequent. I hardly seen Yahoo, Gtalk messenger services are down for maintenance. So I was wondering how they could have done this? (I believe this is design decision)

Any thoughts/links would be great.

Please don't invent another chat protocol. XMPP (what GTalk and many other actually use) is excellent already. There's no need to reinvent the wheel, especially working on it by yourself. — Chris S, Jun 24 '13 at 13:56

score 4 · Answer 1 · answered Jun 24 '13 at 13:54

4

You're right. This is an architectural decision. This is typically done with some form of application load-balancing...

Your clients should not be connecting directly to the servers, but rather a load balancer that makes routing decisions that direct client traffic to a pool of back-end servers based on relevant criteria; round-robin, who's the least-busy, who's online/offline, etc.

This is what allows applications to scale, with the benefit of providing redundancy and the ability to take nodes offline for maintenance.

enter image description here

answered Jun 24 '13 at 13:54

ewwhite

197,159
92
443
809

If I brought down a particular node and added new feature (of my messenger service) to the node, when the node will come up other nodes still doesn't have that feature. My question is pertaining to this situation. – ServerDesigner Jun 25 '13 at 12:11
Then you perform a *rolling upgrade* where you update each member node while keeping the system's availability up. – ewwhite Jun 25 '13 at 12:15
The request queue will keep on increasing like anything while rolling upgrade is being performed. (Especially if huge client base is concerned). I need to make sure I finish the upgrade before it overflows. This all made me to think how these folks (yahoo, google) might have achieved this. – ServerDesigner Jun 25 '13 at 13:00

score 0 · Answer 2 · answered Jun 24 '13 at 13:52

The reason you don't see this for Yahoo, Gtalk, or literally any other service at scale, is that those services have two factors going for them. 1) Load-balancing or a form of round-robin amongst a pool of servers. 2) Cluster(s), aka the said pool of servers. It basically breaks down to this process: remove the server as an active node, perform your maintenance, add back as active, move on to next server, repeat. While this approach is both more time consuming and expensive, it is the most (read: only) correct way to do what your asking without downtime. You could just have a simple mirror between two servers depending on the service, but that will break down quickly at any kind of real scale. While the load-balancing doesn't directly contribute to what your asking it is a main factor in how it's accomplished and means you can distribute work to all nodes in the cluster more effectively, instead of one server taking all the load such as in a mirror.

score 0 · Answer 3 · answered Jun 24 '13 at 13:53

You may not notice them, but most of the time they use what's called "warm failover" where they have a cluster of machines handling requests in a round-robin fashion. When a machine needs to be brought down for service, they'd take it out of the pool of servers and shut it down. Clients affected would basically immediately connect to the next available server. Clients could potentially be programmed to automatically change servers at the request of the server itself.

A lot of this is just design theory and nothing but testing and research will work.

How to maintain servers without disconnecting clients?

3 Answers3