Socket.io is very hard to scale like XMPP current solutions (especially ejabberd).
Even if you decided to scale socket.io using Redis Store as most of articles are mentioning That store concept of Socket.IO is build on the idea of syncing all the connection data between every connected node within your cluster. It doesn’t matter which kind of store you are using in Socket.IO as they are all using this concept as it’s build in to Socket.IO store interface, not the stores that you are using.
In order to understand why syncing is bad for Socket.IO stores we first need to know what is synced. We can find this information in the initstore function of the Socket.IO manager. So this is:
- Handshake data, the handshake data includes ALL request headers, query strings, ip address, information, urls and possible custom data that you’ve added during authorization.
- Ids of all connections that are open, connected and even closed.
- Room names and each id that has joined the room.
All this data will be synced through pub/sub to every connected Socket.IO server. So if you have 2 node processes and they both use Socket.IO stores they will both have all the data of all connections in their own process memory. Not in redis as you might assume. It might not be an issue if you have 500 connected users, but once you approach 5.000> connections this can add up quickly.
Check the following articles that might be helpful:
Cluster fucks when scaling Socket.IO
Redis Store and Socket.io
How do I scale socket.io servers?