-1

I have an api, its working process is like this:

doing some logic, using 1 second's CPU time

wait for network IO, and this IO need 1 second too.

So, normally this api will need about 2 seconds to respond

Then I did a test. I start 10 requests at the same time. EVERY ONE OF THEM need more than 10 seconds to respond

This test means Node will finish all the cpu costly part of all the 10 requests first.

WHY? why doesn't it respond to one request immediately after one IO is done.


Thanks for the comments. I think I need to do some explanation about my concern.

What i concern is if the request count is not 10, if there are 100 request at the same time. All of them will timeout!!

If the Node respond to the child IO event immediately, I think at least 20% of them will not time out.

I think node need some Event Priority mechanism


router.use('/test/:id', function (req, res) {
    var id = req.param('id');
    console.log('start cpu code for ' + id);
    for (var x = 0; x < 10000; x++) {
        for (var x2 = 0; x2 < 30000; x2++) {
            x2 -= 1;
            x2 += 1;
        }
    }
    console.log('cpu code over for ' + id);
    request('http://terranotifier.duapp.com/wait3sec/' + id, function (a,b,data) {
        // how can I make this code run immediately after the server response to me.
        console.log('IO over for ' + data);
        res.send('over');
    });
});
Junnan Wang
  • 627
  • 4
  • 12
  • If you have a node.js app that is "synchronously waiting" for I/O, that's a serious design problem. node.js apps must handle all I/O asynchronously to have any decent performance. Show us your actual code and you'll likely get much better help. – jfriend00 Sep 15 '14 at 03:48
  • The network I/O events are added to the end of the event loop's queue, after the 10 CPU intensive events have already taken their spot. It simply has to wait its turn. This is what's being referred to by "blocking" -- an event that keeps the event loop noticeably busy and delays other events from being handled. – Jonathan Lonowski Sep 15 '14 at 03:48
  • Thanks for your comments, I edit the question now. did some explanation about my concern. – Junnan Wang Sep 15 '14 at 03:56
  • @JunnanWang It isn't possible to help you without seeing your code, or at least an example of how you're handling requests. You're probably doing something to block the thread. – Brad Sep 15 '14 at 04:01
  • Are you using synchronous or asynchronous IO in your server code? As I said before, there are very good strategies for dealing with large numbers of requests in node, but it entirely depends upon your code. Not much we can do, but guess and ask questions if you aren't going to show ANY code. – jfriend00 Sep 15 '14 at 04:01
  • With Node, you should avoid long-running, synchronous tasks. Or, at least extract them from the main process. The bulk of CPU-intensive tasks can be moved to secondary processes using [clusters](http://nodejs.org/api/cluster.html) or [forks](http://nodejs.org/api/child_process.html#child_process_child_process_fork_modulepath_args_options), allowing the main process that's handling HTTP to remain light-weight. – Jonathan Lonowski Sep 15 '14 at 04:06
  • Thanks everyone, code is pasted. @JonathanLonowski , Yes you are right. I just want to did a test to see how Node is handling event. Now it's obvious every new event is directly added to the end of the loop. But I still hope I can do something to change the event order to handle the child event of the older event first. – Junnan Wang Sep 15 '14 at 04:12

1 Answers1

0

Node.js is single threaded. Therefore as long as you have a long running routine it cannot process other pieces of code. The offending piece of code in this instance is your double for loop which takes up a lot of CPU time.

To understand what you're seeing first let me explain how the event loop works.

Node.js event loop evolved out of javascript's event loop which evolved out of web browsers event loop. The web browser event loop was originally implemented not for javascript but to allow progressive rendering of images. The event loop looks a bit like this:

,-> is there anything from the network?
|      |              |
|      no            yes
|      |              |
|      |              '-----------> read network data
|      V                                     |
|  does the DOM need updating? <-------------'
|      |              |
|      no            yes
|      |              |
|      |              v
|      |         update the DOM
|      |              |
'------'--------------'

When javascript was added the script processing was simply inserted into the event loop:

,-> is there anything from the network?
|      |              |
|      no            yes
|      |              |
|      |              '-----------> read network data
|      V                                     |
|  any javascript to run? <------------------'
|      |              |
|      no            yes
|      |              '-----------> run javascript
|      V                                     |
|  does the DOM need updating? <-------------'
|      |              |
|      no            yes
|      |              |
|      |              v
|      |         update the DOM
|      |              |
'------'--------------'

When the javascript engine is made to run outside of the browser, as in Node.js, the DOM related parts are simply removed and the I/O becomes generalized:

,-> any javascript to run?
|      |         |
|      no       yes
|      |         |
|      |         '--------> RUN JAVASCRIPT
|      V                         |
|  is there any I/O <------------'
|      |              |
|      no            yes
|      |              |
|      |              v
|      |          read I/O
|      |              |
'------'--------------'

Note that all your javascript code is executed in the RUN JAVASCRIPT part.

So, what happens with your code when you make 10 connections?

connection1: node accepts your request, processes the double for loops
connection2: node is still processing the for loops, the request gets queued
connection3: node is still processing the for loops, the request gets queued
(at some point the for loop for connection 1 finishes)
node notices that connection2 is queued so connection2 gets accepted,
process the double for loops
    ...
connection10: node is still processing the for loops, the request gets queued
(at this point node is still busy processing some other for loop,
 probably for connection 7 or something)
request1: node is still processing the for loops, the request gets queued
request2: node is still processing the for loops, the request gets queued
(at some point all connections for loops finishes)
node notices that response from request1 is queued so request1 gets processed,
console.log gets printed and res.send('over') gets executed.
    ...
request10: node is busy processing some other request, request10 gets queued
(at some point request10 gets executed)

This is why you see node taking 10 seconds answering 10 requests. It's not that the requests themselves are slow but their responses are queued behind all the for loops and the for loops get executed first (because we're still in the current loop of the event loop).

To counter this, you should make the for loops asynchronous to give node a chance to process the event loop. You can either write them in C and use C to run independent threads for each of them. Or you can use one of the thread modules from npm to run javascript in separate threads. Or you can use worker-threads which is a web-worker like API implemented for Node.js. Or you can fork a cluster of processes to execute them. Or you can simply loop them with setTimeout if parallelism is not critical:

router.use('/test/:id', function (req, res) {
 var id = req.param('id');
    console.log('start cpu code for ' + id);
    function async_loop (count, callback, done_callback) {
        if (count) {
            callback();
            setTimeout(function(){async_loop(count-1, callback)},1);
        }
        else if (done_callback) {
            done_callback();
        }
    }

    var outer_loop_done=0;
    var x2=0;
    async_loop(10000,function(){
        x1++;
        async_loop(30000,function(){
            x2++;
        },function() {
            if (outer_loop_done) {
                console.log('cpu code over for ' + id);
                request('http://terranotifier.duapp.com/wait3sec/' + id,
                    function (a,b,data){
                        console.log('IO over for ' + data);
                        res.send('over');
                    }
                );
            }
        });
    },function(){
        outer_loop_done = 1;
    });
});

The above code will process a response from request() as soon as possible rather than wait for all the async_loops to execute to completion without using threads (so no parallelism) but simply using event queue priority.

slebetman
  • 109,858
  • 19
  • 140
  • 171
  • Thanks, really great answer. It looks like to break the cpu costly code down into some "small events". I think I can use e.emit and e.on to replace the inner loop to accomplish the same goal. Am I right? – Junnan Wang Sep 16 '14 at 05:54
  • e.emit just executes pending events. e.on just queues functions waiting for an event. You still need something to periodically call e.emit for you in order for all your e.on to be executed. The only time related event emitters I know are setTimeout and setInterval. And you can simply pass callbacks to them. No need to create your own custom events. – slebetman Sep 16 '14 at 20:19