In case someone wants to try: https://github.com/codependent/cluster-performance
I am testing Node.js (v0.11.13 - Windows 7) request per second limits with a simple application. I have implemented a service with Express 4 that simulates an I/O operation such as a DB query with a setTimeout callback.
First I test it with only one node process. For the second test I start as many workers as CPUs the machine has.
I am using loadtest to test the service with the following parameters:
loadtest -n 50000 -c 220 -k http://localhost:5000/operations/timeout/20
That is to say, 50k total requests, 220 concurrent clients.
My service sets the timeout (the duration of the processing time) according to the last url parameter (20 mseg):
router.route('/timeout/:time')
.get(function(req, res) {
setTimeout(function(){
appLog.debug("Timeout completed %d", process.pid);
res.json(200,{result:process.pid});
},req.param('time'));
});
- Only one node process
These are the results:
INFO Max requests: 50000
INFO Concurrency level: 200
INFO Agent: keepalive
INFO
INFO Completed requests: 50000
INFO Total errors: 0
INFO Total time: 19.326443741 s
INFO Requests per second: 2587
INFO Total time: 19.326443741 s
INFO
INFO Percentage of the requests served within a certain time
INFO 50% 75 ms
INFO 90% 92 ms
INFO 95% 100 ms
INFO 99% 117 ms
INFO 100% 238 ms (longest request)
2580 requests per second, not bad.
- n workers (n = numCPUs)
In this case I distribute the load equally among the workers using the round robin scheduling policy. Since now there are 8 cores processing requests I was expecting a significant improvement (8 times faster?) in the requests per second results but it only increased to 2905 rps!!! (318 rps more) How can you explain that? Am I doing something wrong?
Results:
Max requests: 50000
Concurrency level: 220
Agent: keepalive
Completed requests: 50000
Total errors: 0
Total time: 17.209989764000003 s
Requests per second: 2905
Total time: 17.209989764000003 s
Percentage of the requests served within a certain time
50% 69 ms
90% 103 ms
95% 112 ms
99% 143 ms
100% 284 ms (longest request)
My cluster initialization code:
#!/usr/bin/env node
var nconf = require('../lib/config');
var app = require('express')();
var debug = require('debug')('mma-nodevents');
var http = require("http")
var appConfigurer = require('../app');
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;
if('v0.11.13'.localeCompare(process.version)>=0){
cluster.schedulingPolicy = cluster.SCHED_RR;
}
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.process.pid + ' died');
cluster.fork();
});
}else{
console.log("starting worker [%d]",process.pid);
appConfigurer(app);
var server = http.createServer(app);
server.listen(nconf.get('port'), function(){
debug('Express server listening on port ' + nconf.get('port'));
});
}
module.exports = app;
UPDATE:
I have finally accepted slebetman's answer since he was right about the reason why in this case cluster performance didn't significantly increase with up to 8 processes. However I would like to point out an interesting fact: with the current io.js version (2.4.0), it has really improved even for this high I/O operation (setTimeout):
loadtest -n 50000 -c 220 -k http://localhost:5000/operations/timeout/20
Single thread:
Max requests: 50000
Concurrency level: 220
Agent: keepalive
Completed requests: 50000
Total errors: 0
Total time: 13.391324847 s
Requests per second: 3734
Total time: 13.391324847 s
Percentage of the requests served within a certain time
50% 57 ms
90% 67 ms
95% 74 ms
99% 118 ms
100% 230 ms (longest request)
8 core cluster:
Max requests: 50000
Concurrency level: 220
Agent: keepalive
Completed requests: 50000
Total errors: 0
Total time: 8.253544166 s
Requests per second: 6058
Total time: 8.253544166 s
Percentage of the requests served within a certain time
50% 35 ms
90% 47 ms
95% 52 ms
99% 68 ms
100% 178 ms (longest request)
So it's clear that with the current io.js/node.js releases, although you don't get an 8x rps increase, the thoughput is almost 1.7 times faster.
On the other hand, as expected, using a for loop iterating for the amount of milliseconds indicated in the request (and thus, blocking the thread), the rps increases proportionally to the number of threads.