Increase of parallel requests form node.js to external system takes more time to respond

Question

I have a simple case where I'm requesting a different upstream proxy server from my node.js server. With the increase in load I see the request takes lot of time to execute(though time taken to respond from my upstream proxy server is constant across the requests). To demonstrate the issue i've written a sample program as below. When I execute the below program, the first request takes 118ms to execute and the last one takes 10970ms depending on the website you hit (I've changed the url to google, Try it out with your favourite website). If you observe i'm using async to parallelize my requests.

The question is, what is the reason node.js takes this much time to execute a request when run in parallel. To give some more context on the infra settings(centos 6.5) I have opened up my port range from 1024 to 65535, change the fin_timeout to 15 seconds and enable tw_reuse =1 for sockets in sysctl.conf

var http = require('http');
var uuid = require('node-uuid');
var async = require('async');

function callExternalUrl(){
    var uniqueId = uuid.v4();
    console.time(uniqueId);
    var options = {
        host: 'google.com',
        port: '80',
        path: '/',
        method: 'GET'
    };
    var req = http.request(options, function(res) {
        var msg = '';
        res.setEncoding('utf8');
        res.on('data', function(chunk) {
            msg += chunk;
            console.timeEnd(uniqueId);
        });
        res.on('end', function() {
        });
    });
    req.end();
}

function iterateAsync(callback){
    var iter = [];
    for(var i=0; i<1000; i++){
        iter[i] = i;
    }
    async.each(iter,
        function(item, callback) {
            callExternalUrl();
        },
        function(err) {
            callback(err);
        }
    );
}

iterateAsync(function(){console.log('done');});

To give more context below is the code in ruby to do the same. I understand i can't compare these two languages as in apples to apples. But the idea is to show the time it takes to execute the same requests in sequence using ruby. I don't see any increase in the response times for each request going out in sequence. So, I doubt the parallel requests using node is taking more time for the request to respond(and the issue is not from the server to respond but its from sending out the request from the machine itself)

require 'rest_client'

 1000.times do |number|
   beginning = Time.now
   response = RestClient.get 'http://google.com'
   puts "Time elapsed #{Time.now - beginning} seconds"
 end

Node.js runs in a single thread, use cluster for what you want to "parallelize". — micnic, Apr 07 '15 at 07:48
Of course Yes And I know that. Original code runs on 32 cores with cluster module. But I don't see any major difference in the pattern, where the first request completes extremely fast and the later requests takes lot of time for execution. I doubt this is is something to do with the way node is dealing with opening and closing of http connections. Might be also little related to how the infra need to be fine tuned for this kind of http requests. — Kishore Yekkanti, Apr 07 '15 at 08:04
@micnic Node can parallelize I/O just fine without having to resort to multiple processes. — robertklep, Apr 07 '15 at 10:55

score 3 · Accepted Answer · answered Apr 07 '15 at 10:55

For one, you're not calling the async iterator callback function:

function callExternalUrl(asyncCallback) {
  ...
  res.on('end', function() {
    asyncCallback();
  });
  ...
}

function iterateAsync(callback) {
  var iter = [];
  for(var i=0; i<1000; i++){
      iter[i] = i;
  }
  async.each(iter,
      function(item, asyncCallback) { // <-- HERE
        callExternalUrl(asyncCallback);
      },
      function(err) {
        callback(err);
      }
  );
}

Also, depending on the Node version you're using, the http module may limit the number of parallel requests being made to a particular hostname:

$ node -pe 'require("http").globalAgent.maxSockets'

On Node 0.10, the default is 5; on Node 0.12, the default is Infinity ("unlimited"). So if you're not on Node 0.12, you should increase that value in your code:

var http = require('http');
http.globalAgent.maxSockets = Infinity;
...

robertklep: I was doing callback in my original code. Missed it as part of my sample code above. As you mentioned rightly the problem is with the maxSockets. I am using node 0.10 before i posted it on SO i change the setting to 9999999. Some how this didn't worked out. When i reduced my number to 100, it started taking the affect. I should take some time to dig through the node http code to understand why the higher number is not respected(may be its also because of hard limit on my server h/w). Thank you for your help and immediate response. — Kishore Yekkanti, Apr 14 '15 at 13:45
Not to `Infinity`. Maybe something like 200. Don't run into the kernel max-open-files-per-process or -per-user limits. — wberry, May 15 '15 at 20:20

score 2 · Answer 2 · answered Apr 08 '15 at 11:19

I've tried to run your scenario by using JXcore (fork of Node.JS, and an open source project now on github), which offers multitasking (among many other new features).

var task = function (item) {
  var http = require('http');
  var uuid = require('node-uuid');

  var uniqueId = uuid.v4() + "-" + process.threadId;
  console.time(uniqueId);
  var options = {
    host: 'google.com',
    port: '80',
    path: '/',
    method: 'GET'
  };
  var req = http.request(options, function (res) {
    var msg = '';
    res.setEncoding('utf8');
    res.on('data', function (chunk) {
      msg += chunk;
      console.timeEnd(uniqueId);
    });
    res.on('end', function () {
      process.release();
    });
  });
  req.end();
  process.keepAlive();
};

jxcore.tasks.setThreadCount(4);
console.time("total");
process.on('exit', function () {
  console.timeEnd("total");
});

for (var i = 0; i < 1000; i++)
  jxcore.tasks.addTask(task, i);

The sample is not really optimized, but still the total 1000 requests runs with JXcore a little bit faster for me (I was able to measure up to 20% gain on my platform). That may vary depending on the machine, since multitasking is using different threads/instances within one single process (no need for clustering any more). My machine has just 4 threads, that's why I used jxcore.tasks.setThreadCount(4);. You can try with your 32 :)

The way of handling each single request is not significantly different, so I'm not saying that each request takes less time, but the key might be hidden in different queuing mechanism as opposite to "async" module. And of course thanks to multitasking.

Increase of parallel requests form node.js to external system takes more time to respond

2 Answers2