The throughput may be impacted if there are many matching subscriptions on a given subject.
For instance, suppose you are using 1 connection and are publishing on foo
. There are 100 subscriptions on foo
. When the server gets a message, it will deliver it to all matching subscriptions. In this case, it means TCP send this message 100 times, regardless if subscriptions belong to the same connection or not.
While doing the send to subscribers, the server is not reading other messages published by this connection.
Scaling horizontally by adding servers may help if you also distribute the load of subscriptions in the cluster. In the example above, say that 50 consumers are on one server, 50 on another, then the server receiving the published message will now have to send that message only 50+1 times (50 for its local subscribers, 1 for the route). The other server will then send 50 messages to its local subscribers.
But merely adding servers will not improve a single connection throughput if there was a single (or no) matching subscription.
Another way to improve pub throughput is to use more connections. Since the server uses a go routine per connection (to read data from socket then send to subscriptions), some work can be parallelized.
I could run some benchmark included in the repo to get the upper limit that you can get on your machine. For instance, server bench tests usually write data directly to the socket instead of using a NATS client. This is to measure the server performance, without any limitation imposed by the client implementation:
go test -v -run=xxx -bench=. ./test
Make sure to look at the way you send messages and how they are processed in the subscriptions' callbacks. Anything you can do to improve performance there will be of greater value.
Hope this helps.