Node.js multithreading using threads-a-gogo

Question

I am implementing a REST service for financial calculation. So each request is supposed to be a CPU intensive task, and I think that the best place to create threads it's in the following function:

exports.execute = function(data, params, f, callback) {

    var queriesList = [];
    var resultList = [];

    for (var i = 0; i < data.lista.length; i++) 
    {
        var query = (function(cod) {

            return function(callbackFlow) {

                params.paramcodneg = cod;

                doCdaQuery(params, function(err, result)
                {
                    if (err) 
                    {
                        return callback({ERROR: err}, null);
                    }

                    f(data, result, function(ret)
                    {
                        resultList.push(ret);
                        callbackFlow();
                    });
                });
            }
        })(data.lista[i]);

        queriesList.push(query);
    }

    flow.parallel(queriesList, function() {
        callback(null, resultList);
    });
};

I don't know what is best, run flow.parallel in a separeted thread or run each function of the queriesList in its own thread. What is best ? And how to use threads-a-gogo module for that ?

I've tried but couldn't write the right code for that.

Thanks in advance. Kleyson Rios.

score 1 · Answer 1 · answered Jan 30 '15 at 18:07

I'll admit that I'm relatively new to node.js and I haven't yet used threads a gogo, but I have had some experience with multi-threaded programming, so I'll take a crack at answering this question.

Creating a thread for every single query (I'm assuming these queries are CPU-bound calculations rather than IO-bound calls to a database) is not a good idea. Creating and destroying threads in an expensive operation, so creating an destroying a group of threads for every request that requires calculation is going to be a huge drag on performance. Too many threads will cause more overhead as the processor switches between them. There isn't any advantage to having more worker threads than processor cores.

Also, if each query doesn't take that much processing time, there will be more time spent creating and destroying the thread than running the query. Most of the time would be spent on threading overhead. In this case, you would be much better off using a single-threaded solution using flow or async, which distributes the processing over multiple ticks to allow the node.js event loop to run.

Single-threaded solutions are the easiest to understand and debug, but if the queries are preventing the main thread from getting other stuff done, then a multi-threaded solution is necessary.

The multi-threaded solution you propose is pretty good. Running all the queries in a separate thread prevents the main thread from bogging down. However, there isn't any point in using flow or async in this case. These modules simulate multi-threading by distributing the processing over multiple node.js ticks and tasks run in parallel don't execute in any particular order. However, these tasks still are running in a single thread. Since you're processing the queries in their own thread, and they're no longer interfering with the node.js event loop, then just run them one after another in a loop. Since all the action is happening in a thread without a node.js event loop, using flow or async in just introduces more overhead for no additional benefit.

A more efficient solution is to have a thread pool hanging out in the background and throw tasks at it. The thread pool would ideally have the same number of threads as processor cores, and would be created when the application starts up and destroyed when the application shuts down, so the expensive creating and destroying of threads only happens once. I see that Threads a Gogo has a thread pool that you can use, although I'm afraid I'm not yet familiar enough with it to give you all the details of using it.

I'm drifting into territory I'm not familiar with here, but I believe you could do it by pushing each query individually onto the global thread pool and when all the callbacks have completed, you'll be done.

The Node.flow module would be handy here, not because it would make processing any faster, but because it would help you manage all the query tasks and their callbacks. You would use a loop to push a bunch of parallel tasks on the flow stack using flow.parallel(...), where each task would send a query to the global threadpool using threadpool.any.eval(), and then call ready() in the threadpool callback to tell flow that the task is complete. After the parallel tasks have been queued up, use flow.join() to run all the tasks. That should run the queries on the thread pool, with the thread pool running as many tasks as it can at once, using all the cores and avoiding creating or destroying threads, and all the queries will have been processed.

Other requests would also be tossing their tasks onto the thread pool as well, but you wouldn't notice that because the request being processed would only get callbacks for the tasks that the request gave to the thread pool. Note that this would all be done on the main thread. The thread pool would do all the non-main-thread processing.

You'll need to do some threads a gogo and node.flow documentation reading and figure out some of the details, but that should give you a head start. Using a separate thread is more complex than using the main thread, and making use of a thread pool is even more complex, so you'll have to choose which one is best for you. The extra complexity might or might not be worth it.

Node.js multithreading using threads-a-gogo

1 Answers1