0

I'm building a system similar to pingdom.com, where I have around 10k domains for checking uptime every 5 minutes. I'm using ec2 micro instances for the checks to be performed. My check urls and their last check times are stored in mongodb. A node process takes the top n checks for processing that are not processed within last 5 minutes, then the url requests are done asynchronously. I'm using node request library and my url check code looks like the following:

var request = require("request");

var options = {
    url: url,
    headers: {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Encoding': 'gzip,deflate,sdch',
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.17 Safari/537.36'
    },
    timeout: 10000,
    maxRedirects: 10,
    pool: false,
    strictSSL: false
};

request(options, function (error, response, body) {
    ...
});

Now I've noticed when I make more than 10 requests simultaneously to different domains for uptime check, the response times for those domains slows down as the number of simultaneous requests is increased. I thought the response times should not increase as node is asynchronous. I'm considering to try node-curl library too, but before that I want to confirm if I'm doing anything wrong here.

I've tried tweaking ulimit & pool.maxConnection limits without any luck. I know if I increase the number of ec2 instances, I can achieve the 10k checks per 5 minute with acceptable response times, but I guess services like pingdom has many more checks to deal with and I'm curious what do they do to scale their systems apart from increasing uptime check instances.

Masum
  • 1,678
  • 15
  • 19

1 Answers1

0

You may want to try a small cluster of Node servers on one instance. They can each make use of a different process/thread. One instance can act as controller and send commands via redis pub/sub or other means. I'm not sure what your bottleneck is, though; might be a bandwidth issue with the micro instance. You should probably run some tests which identify where the slowest bit of code is.

AlexMA
  • 9,842
  • 7
  • 42
  • 64