2

I have a very simple nodejs application that accepts json data (1KB approx.) via POST request body. The response is sent back immediately to the client and the json is posted asynchronously to an Apache Kafka queue. The number of simultaneous requests can go as high as 10000 per second which we are simulating using Apache Jmeter running on three different machines. The target is to achieve an average throughput of less than one second with no failed requests.

On a 4 core machine, the app handles upto 4015 requests per second without any failures. However since the target is 10000 requests per second, we deployed the node app in a clustered environment.

Both clustering in the same machine and clustering between two different machines (as described here) were implemented. Nginx was used as a load balancer to round robin the incoming requests between the two node instances. We expected a significant improvement in the throughput (like documented here) but the results were on the contrary. The number of successful requests dropped to around 3100 requests per second.

My questions are:

  1. What could have gone wrong in the clustered approach?
  2. Is this even the right way to increase the throughput of Node application?
  3. We also did a similar exercise with a java web application in Tomcat container and it performed as expected 4000 requests with a single instance and around 5000 successful requests in a cluster with two instances. This is in contradiction to our belief that nodejs performs better than a Tomcat. Is tomcat generally better because of its thread per request model?

Thanks a lot in advance.

Ashok
  • 435
  • 1
  • 4
  • 16
  • Clustering is generally the right approach, but whether or not it helps depends upon where your bottleneck is. You will need to do some measuring and some experiments to determine that. If you are CPU-bound and running on a multi-core computer, then clustering should help significantly. I wonder if your bottleneck is something besides CPU such as networking or other shared I/O or even Nginx? If that's the case, then you need to fix that before you would see the benefits of clustering. – jfriend00 Jun 27 '17 at 06:10
  • **Is tomcat generally better because of its thread per request model?** No. That's not a good generalization. If you are CPU-bound, then threading can help (and so can clustering with nodejs). But, if you are I/O bound, then threads are often more expensive than async I/O like nodejs because of the resource overhead of the threads themselves and the overhead of context switching between threads. – jfriend00 Jun 27 '17 at 06:12
  • @jfriend00 Thanks for the inputs. I shall try to do this one step at a time to see the change in the throughput. Guess that should give us a hint on where the bottleneck is. I forgot to mention that for http, we are using express instead of the native http provided by node. Hope it does not introduce an overhead to the request handling? Will check that if no breakthrough in the other areas. – Ashok Jun 28 '17 at 07:14
  • @jfriend00 I would like to accept your response as the answer, but it is a comment. can it be done still somehow? – Ashok Jun 28 '17 at 07:24

2 Answers2

2

Per your request, I'll put my comments into an answer:

Clustering is generally the right approach, but whether or not it helps depends upon where your bottleneck is. You will need to do some measuring and some experiments to determine that. If you are CPU-bound and running on a multi-core computer, then clustering should help significantly. I wonder if your bottleneck is something besides CPU such as networking or other shared I/O or even Nginx? If that's the case, then you need to fix that before you would see the benefits of clustering.

Is tomcat generally better because of its thread per request model?

No. That's not a good generalization. If you are CPU-bound, then threading can help (and so can clustering with nodejs). But, if you are I/O bound, then threads are often more expensive than async I/O like nodejs because of the resource overhead of the threads themselves and the overhead of context switching between threads. Many apps are I/O bound which is one of the reasons node.js can be a very good choice for server design.

I forgot to mention that for http, we are using express instead of the native http provided by node. Hope it does not introduce an overhead to the request handling?

Express is very efficient and should not be the source of any of your issues.

jfriend00
  • 683,504
  • 96
  • 985
  • 979
1

As jfriend said , you need to find the bottlenecks , one thing you can try is to reduce the bandwith/throughput by using sockets to pass the json and especially this library https://github.com/uNetworking/uWebSockets. The main reason for that is that an http request is significantly heavier than a socket connection.

Good Example : https://webcheerz.com/one-million-requests-per-second-node-js/

lastly you can also compress the json via (http gzip) or a third party module.

work on the weight ^^

Hope it helps!

Revln9
  • 837
  • 5
  • 10
  • Thanks for the pointer to websockets. Infact we are looking at MQTT for our purposes. Will check the compression option too. – Ashok Jun 28 '17 at 07:26