5

We have been evaluating Spring-Stomp-Broker-websockets, for a full duplex type messaging application that will run on AWS. We had hoped to use Amazon MQ. We are pushing messages to individual users, and also broadcasting. So functionally the stack did look good. We have about 40,000 - 80,000 users. We quickly found, with load testing, that none of the spring stack or Amazon MQ scales very well, issues:

  1. Spring Cloud Gateway instance cannot handle more than about 3,000 websockets before dying.
  2. Spring Websocket server instance can also only handle about 4,000 websockets, on a T3.Medium. When we bypass the Gateway.
  3. AWS limits Active MQ connections to 100 for a small server, and then only 1000 on a massive server. No in-between, this is just weird.

Yes we have increased the file handles etc on the machines so TCP connections are not the limit. There is no way Spring could ever get close to the limit here.We are sending a 18 K message, for load, the maximum we will expect. In our results message size has little impact, its just the connection over head on the Spring Stack.

The StompBrokerRelayMessageHandler opens a connection to the Broker for each STOMP Connect. There is no way to pool the connections. So this makes this Spring feature completely useless for any ‘real’ web applications. In order to support our users the cost of AWS massive servers for MQ means this solution is ridiculously expensive, requiring 40 of the biggest servers. In load testing, the Amazon MQ machine is doing nothing, with the 1000 users, it is not loaded.In reality a couple of medium sized machine is all we need for all our brokers.

Has any one ever built a real world solution, as above, using Spring Stack. It appears no one has done this, and no one has scaled this up. Has anyone written a pooling StompBrokerRelayMessageHandle. I assume there must be a reason this can’t work as it should be the default approach ? What is the issue here ?

Seems this issues makes the whole Spring Websocket + STOMP + Broker approach pretty useless and we are now forced to use a different approach for message reliability, and for messaging across servers where users are not connected (main reason we are using broker) and have gone back too using a Simple Broker, and wrote a registry to manage the client server location. So we have now eliminated the broker and the figures above are with that model. The we may add in AWS SQS for reliability of messages.

Whats left. We were going to use the Spring Cloud Gateway to load balance across multiple small WebSocket servers, but seems this approach will not work, as the WebSocket load a server can handle is just way too small. The Gateway just cannot handle it. We are now removing Spring Cloud Gateway and using a AWS load balancer instead. So now we can get significantly more connections load balanced. Why does Spring Cloud Gateway not load balance ?

Whats left. The websocket server instances are t3.mediums, they have no business logic and just pass a message between 2 clients, so it really does not need a bigger server. We would expect considerably better than 4,000 connections. However this is close to usable.

We are now drilling into the issues to get more details on where the performance bottlenecks are, but the lack of any tuning guides or scaling information does not suggest good things about Spring. Compare this to Node solutions that scale very well, and handle larger number of connections on small machines.

Next approach is to look at WebFlux + WebSocket, but then we loose STOMP. Maybe we’ll check raw websockets ?

This is just an early attempt to see if anyone actually has used Spring Websockets in anger and can share real working production architecture, as only Toy examples are available. So any help on above issues would be appreciated.

Kim Horn
  • 51
  • 2

0 Answers0