Socket.io: How to reduce emit delay with many concurrent connections?

Question

Im running a 4-core Amazon EC2 instance(m3.xlarge) with 200.000 concurrent connections with no ressouce problems(each core at 10-20%, memory at 2/14GB). Anyway if i emit a message to all the user connected first on a cpu-core gets it within milliseconds but the last connected user gets it with a delay of 1-3 seconds and each CPU core goes up to 100% for 1-2 seconds. I noticed this problem even at "only" 50k concurrent users(12.5k per core).

How to reduce the delay?

I tried changing redis-adapter to mongo-adapter with no difference.

Im using this code to get sticky sessions on multiple cpu cores:

https://github.com/elad/node-cluster-socket.io

The test was very simple: The clients do just connect and do nothing more. The server only listens for a message and emits to all.

EDIT: I tested single-core without any cluster/adapter logic with 50k clients and the same result.

I published the server, single-core-server, benchmark and html-client in one package: https://github.com/MickL/socket-io-benchmark-kit

jfriend00 · Answer 1 · 2017-09-08T13:53:49.670

1

OK, let's break this down a bit. 200,000 users on four cores. If perfectly distributed, that's 50,000 users per core. So, if sending a message to a given user takes .1ms each of CPU time, that would take 50,000 * .1ms = 5 seconds to send them all.

If you see CPU utilization go to 100% during this, then a bottleneck probably is CPU and maybe you need more cores on the problem. But, there may be other bottlenecks too such as network bandwidth, network adapters or the redis process. So, one thing to immediately determine is whether your end-to-end time is directly proportional to the number of clusters/CPUs you have? If you drop to 2 cores, does the end-to-end time double? If you go to 8, does it drop in half? If yes for both, that's good news because that means you probably are only running into CPU bottleneck at the moment, not other bottlenecks. If that's the case, then you need to figure out how to make 200,000 emits across multiple clusters more efficient by examining node-cluster-socket.io code and finding ways to optimize your specific situation.

The most optimal the code could be would be to have every CPU do all it's housekeeping to gather exactly what it needs to send to all 50,000 users and then very quickly each CPU does a tight loop sending 50,000 network packets one right after the other. I can't really tell from the redis adapter code whether this is what happens or not.

A much worst case would be where some process gets all 200,000 socket IDs and then goes in a loop to send to each socket ID where in that loop, it has to lookup on redis which server contains that connection and then send a message to that server telling it to send to that socket. That would be a ton less efficient than instructing each server to just send a message to all it's own connected users.

It would be worth trying to figure out (by studying code) where in this spectrum, the socket.io + redis combination is.

Oh, and if you're using an SSL connection for each socket, you are also devoting some CPU to crypto on every send operation. There are ways to offload the SSL processing from your regular CPU (using additional hardware).

edited Sep 08 '17 at 13:53

answered Sep 08 '17 at 13:41

jfriend00

683,504
96
985
979

Thanks for your answer, nice ideas! I think its proportional: when i was at 50k it took 1-2 seconds (i would like to have <500ms) and with 200k it was about 2-5 seconds. I don't think it is about the node-cluster stuff or redis. Im pretty sure it is socket.io itself: I could test 1 core with 50k without any clustering. Very simple NodeJs script: Only on('connect') ... on('message', msg => { socket.emit('message', msg) } ) but im pretty sure the result would be the same. – Mick Sep 08 '17 at 13:47
What i didnt tested, but scares me: What if there would be 2,3,4,5 emits each second to all users? I guess CPU would stay at 100%, stutter and V8 would not overhaul. Anyway 0.1ms per user sounds pretty fast. But yes, thats 5 seconds for all :( – Mick Sep 08 '17 at 13:51
@Mick - I added a few more thoughts. – jfriend00 Sep 08 '17 at 13:54
I will try 1 core with 50k without clustering / socket-adapter and see if something changes. If yes then redis/mongodb is the problem. – Mick Sep 08 '17 at 13:56
I shared my toolkit here: https://github.com/MickL/socket-io-benchmark-kit - It contains a cluster-server, single-core-server, a benchmark using artillery and a html-client – Mick Sep 08 '17 at 14:11
@Mick - How are you making 50,000 client connections? If you're doing a large number of connections from a single computer (often done for testing purposes), then you may be introducing additional bottlenecks in that computer that may not be present in the real world since all those connections share the same CPU and network adapter and client bandwidth. – jfriend00 Sep 08 '17 at 14:11
I have 4 AWS instances each 4 processes of Artillery connecting 25 users per second per process (100 per per Instance per second, 400 total per second). The server seems to have no problem with 400 new connections per second. Only 30% CPU usage. The sourcecode i used could be found within the GitHub project. I launched "benchmark" with pm2 4 times per instance. – Mick Sep 08 '17 at 14:16
So, you have 200,000 clients in 16 processes? Some part of the bottleneck could be there too since all 200,000 messages have to be processed by the 16 processes in 4 AWS instances. How big are the messages you're sending and what network bandwidth do you have to each server box and each client box? It might also be worth a network bandwidth calculation where you have to also add in some TCP overhead for each message and for confirmation ACKs to see how saturated your network links are. – jfriend00 Sep 08 '17 at 14:23
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/153963/discussion-between-mick-and-jfriend00). – Mick Sep 08 '17 at 14:45
Currently im testing on a single core and see the exact same: First user gets it immediatly. 10,000th user around 1 second, 20,000th user around 1-2 seconds. ALSO if i send like 5-10 messages one after another they dont get emitted for some seconds even to the first user. The first client gets 1-3 and then 5-10 seconds he gets the rest. CPU is at 100% at this time. – Mick Sep 08 '17 at 14:51

Socket.io: How to reduce emit delay with many concurrent connections?

1 Answers1