15

In case someone wants to try: https://github.com/codependent/cluster-performance

I am testing Node.js (v0.11.13 - Windows 7) request per second limits with a simple application. I have implemented a service with Express 4 that simulates an I/O operation such as a DB query with a setTimeout callback.

First I test it with only one node process. For the second test I start as many workers as CPUs the machine has.

I am using loadtest to test the service with the following parameters:

loadtest -n 50000 -c 220 -k http://localhost:5000/operations/timeout/20

That is to say, 50k total requests, 220 concurrent clients.

My service sets the timeout (the duration of the processing time) according to the last url parameter (20 mseg):

router.route('/timeout/:time')
.get(function(req, res) {
    setTimeout(function(){
        appLog.debug("Timeout completed %d", process.pid);
        res.json(200,{result:process.pid});
    },req.param('time'));
});    
  1. Only one node process

These are the results:

INFO Max requests:        50000
INFO Concurrency level:   200
INFO Agent:               keepalive
INFO
INFO Completed requests:  50000
INFO Total errors:        0
INFO Total time:          19.326443741 s
INFO Requests per second: 2587
INFO Total time:          19.326443741 s
INFO
INFO Percentage of the requests served within a certain time
INFO   50%      75 ms
INFO   90%      92 ms
INFO   95%      100 ms
INFO   99%      117 ms
INFO  100%      238 ms (longest request)

2580 requests per second, not bad.

  1. n workers (n = numCPUs)

In this case I distribute the load equally among the workers using the round robin scheduling policy. Since now there are 8 cores processing requests I was expecting a significant improvement (8 times faster?) in the requests per second results but it only increased to 2905 rps!!! (318 rps more) How can you explain that? Am I doing something wrong?

Results:

Max requests:        50000
Concurrency level:   220
Agent:               keepalive

Completed requests:  50000
Total errors:        0
Total time:          17.209989764000003 s
Requests per second: 2905
Total time:          17.209989764000003 s

Percentage of the requests served within a certain time
  50%      69 ms
  90%      103 ms
  95%      112 ms
  99%      143 ms
 100%      284 ms (longest request)

My cluster initialization code:

#!/usr/bin/env node
var nconf = require('../lib/config');
var app = require('express')();
var debug = require('debug')('mma-nodevents');
var http = require("http")
var appConfigurer = require('../app');
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;

if('v0.11.13'.localeCompare(process.version)>=0){
    cluster.schedulingPolicy = cluster.SCHED_RR;
}

if (cluster.isMaster) {
    // Fork workers.
    for (var i = 0; i < numCPUs; i++) {
        cluster.fork();
    }
    cluster.on('exit', function(worker, code, signal) {
        console.log('worker ' + worker.process.pid + ' died');
        cluster.fork();
    });
}else{
    console.log("starting worker [%d]",process.pid);
    appConfigurer(app);
    var server = http.createServer(app);
    server.listen(nconf.get('port'), function(){
        debug('Express server listening on port ' + nconf.get('port'));
    });

}

module.exports = app;

UPDATE:

I have finally accepted slebetman's answer since he was right about the reason why in this case cluster performance didn't significantly increase with up to 8 processes. However I would like to point out an interesting fact: with the current io.js version (2.4.0), it has really improved even for this high I/O operation (setTimeout):

loadtest -n 50000 -c 220 -k http://localhost:5000/operations/timeout/20

Single thread:

Max requests:        50000
Concurrency level:   220
Agent:               keepalive

Completed requests:  50000
Total errors:        0
Total time:          13.391324847 s
Requests per second: 3734
Total time:          13.391324847 s

Percentage of the requests served within a certain time
  50%      57 ms
  90%      67 ms
  95%      74 ms
  99%      118 ms
 100%      230 ms (longest request)

8 core cluster:

Max requests:        50000
Concurrency level:   220
Agent:               keepalive

Completed requests:  50000
Total errors:        0
Total time:          8.253544166 s
Requests per second: 6058
Total time:          8.253544166 s

Percentage of the requests served within a certain time
  50%      35 ms
  90%      47 ms
  95%      52 ms
  99%      68 ms
 100%      178 ms (longest request)

So it's clear that with the current io.js/node.js releases, although you don't get an 8x rps increase, the thoughput is almost 1.7 times faster.

On the other hand, as expected, using a for loop iterating for the amount of milliseconds indicated in the request (and thus, blocking the thread), the rps increases proportionally to the number of threads.

codependent
  • 23,193
  • 31
  • 166
  • 308
  • amdahl's law states that even in the most optimistic scenario you wouldn't get the 8 times improvment you're looking for. there can be many possible answers to why your results didn't improve much. Your app either doesn't scale well or the server ran on doesn't really have 8 CPU's available etc – user2717954 Nov 07 '14 at 08:05
  • Hehe, I knew 8x was too optimistical. Anyway with 8 CPUSs available to do some work it should actually improve the rps, whereas it doesn't even double the performance. About "your app doesn't scale", well you can see the code is quite straightforward: a route with a setTimeout(), how come it wouldn't scale... – codependent Nov 07 '14 at 12:04

4 Answers4

21

I/O operations is exactly the kind of application Node.js was designed and optimized for. I/O operations (and setTimeout) essentially run in parallel as much as the hardware (network, disk, PCI bridge, DMA controller etc.) allows.

Once you realize this, it's easy to understand why running many parallel I/O operations in a single process takes roughly the same amount of time as running many parallel I/O operations in many processes/threads. Indeed, the direct analog would be running many parallel I/O operations in one process is exactly the same as running single blocking I/O operations in many parallel processes.

Clustering allows you to use multiple CPUs/cores if you have them. But your process don't use CPU cycles. So clustering give you very little advantage (if any).

slebetman
  • 109,858
  • 19
  • 140
  • 171
2

Have you tried splitting the loadtesting program itself between two or more processes? It's entirely possible you've instead reached the limits of the loadtest application.

krillr
  • 66
  • 2
  • I believe this needs to be better explored. Perhaps an external loadtesting, instead of internal, which is going to make itself very busy as that scales. – Jason May 14 '23 at 02:52
0

A simple calc:

1000/20*220 = 11000   // theoretically max request per second

You test on localhost, this means network time used is tiny, so i guess log output is block

appLog.debug("Timeout completed %d", process.pid);

Please comment it and try again.

tangxinfa
  • 1,410
  • 13
  • 12
-1

Don't use cluster.SCHED_RR, just use cluster.SCHED+

eko
  • 39,722
  • 10
  • 72
  • 98