You're really looking at two different requirements:
- Load balancing: expose a single network address for multiple Web (or other protocol) servers.
- Communicating state (the messages) between multiple servers.
The first requirement is straightforward: use a hardware or software load balancer, or use a single Apache web server in front of multiple Java servers.
The second requirement is the issue.
Let's think about a hypothetical chat server on a single system. When a message is received, the request is parsed, and the new message is placed in memory for the recipient. There will need to be handling of common situations: user logs off in the middle of a session, for example. You will also need to figure out how to pass the received messages back to the users' browsers. The browser can poll ("send me all messages after #N for user X") or the server may be able to push the received messages using one of several techniques. If you have a chat server running on top of a Web server, this should all be familiar.
The sticky part is: how do you do this over multiple machines? Off the top of my head, I can think of a couple of ways that will scale OK:
- Keep track of which server the recipient is on. Use another transport mechanism to send the message to that server so it can be shoved into memory as if the sender had been local. See "message queuing" or "enterprise service bus."
- Decouple message handling from communication: designate one or more servers for the active conversations. Have the recipient server send a message to those servers; use a notification mechanism (good) or polling (not so good) to alert recipient servers that there's a chat message waiting to be sent out. Special feature: use a distributed hash table to distribute message mailboxes to the pool of servers; if one or more servers fail, the DHT can automatically adjust.
- Use broadcast: each server broadcasts to all other servers if the recipient is not local. Every server receives the notification; only the one with the recipient does anything with it.
The key here is that you can no longer make use of shared memory between multiple machines. You have to use one of several possible mechanisms to move the message between servers. You're unlikely to use a general-purpose, relatively high overhead protocol (like HTTP) for this; there are lots of good tools that are more efficient, and you can implement this at several levels of abstraction, from using a shared cache tool like Terracotta, a peer-to-peer network protocol like JXTA, an enterprise service bus like ActiveMQ, etc. Depending on how much you want to put on the user's browser, you can even run some message queuing software directly on the client system -- the notification there's a new message can go directly to the user instead of to an intermediate mailbox.
The clear optimization is to support a mechanism to move user with active conversations to the same server, but that won't work with most load balancing mechanisms. There ought to be some way to force affinity between a particular server for a user, but I can't think of an easy one.