12

Requirement is concurrent request 1000 per second and IO operation such as database queries on each request. As nodejs works on event loop and it will assign the IO operation to thread pool, but thread pool default size is 4, so at same time maximum 4 thread (IO operation) can work and rest has to wait in queue. Once any thread complete the execution, they can process.

Query 1 - Can we increase the thread pool size to N number as per requirement and does it improve the performance or it will down the performance?

Query 2 - How we can achieve above requirement in nodejs?

Query 3 - Nodejs is idle choice for this requirement or other suggestion such as golang

Matt
  • 68,711
  • 7
  • 155
  • 158
Bhavin
  • 179
  • 1
  • 2
  • 13
  • 1
    IO operation does not use the thread pool. They run on the main thread. The parallelism of IO is not bound by threads therefore 100 therads are just as efficient as 1 thread. Instead IO parallelism is bound by IO devices - basically your network card, the number of PCI lanes your CPU support, the number of DMA channels your CPU support etc. – slebetman May 23 '22 at 09:19

3 Answers3

25

Network I/O operations on node.js runs on the main thread.

Yes, node.js spawns four threads in addition to the main thread but none of them are used for network I/O such as database operations. The threads are:

  1. DNS resolver (because most OS provide only synchronous API for this)

  2. File system API (because this is messy to do asynchronously cross-platform)

  3. Crypto (Because this uses the CPU)

  4. Zlib (zip compression)

Everything else do not use threads unless you yourself spawn worker_threads. See node's own documentation for more information about this: https://nodejs.org/en/docs/guides/dont-block-the-event-loop/. Do not rely on information not from the node.js project itself such as youtube or medium articles that say node I/O uses thread pools - they don't know what they are talking about.

Increasing thread pool size will not do anything for network I/O because node.js simply does not have any code to make network I/O utilize the extra threads. If you want to spread the load across multiple processors you can use clustering. You can write your own clustering code or use the clustering mode of process managers such as pm2 to pass the connections to your processes.

How can node claim to be high performance if it uses only ONE THREAD!

The thing most non-systems programmers don't realise is that waiting for I/O takes exactly zero CPU time. Doing it by spawning threads means you are allocating lots and lots of RAM and are mostly using zero CPU time for all those threads (imagine spawning 1024 threads each not using the CPU at all). While those threads (or in the case of node.js the main thread) are waiting for 1000 replies from the db the OS queues those requests into a series of packets and send them down to your network card who in turn send them to the database one bit at a time - yes, I/O at it's core is not parallel (unless you use trunking on multiple network cards). So most of the heavy lifting is done by Ethernet while your process is paused by the OS (waiting).

What node.js does is that while the request is waiting it makes another request. This is what non-blocking means. Node does not wait for a request to complete before processing all the other requests. This means by default all requests you make in node.js are concurrent - they don't wait for other requests to finish.

On the request completion side any responses received from the server triggers node to search the event queue (really it's just a set at this point because any item in the queue can complete at any time) and find the corresponding callback to call. Executing the callback do take CPU time but not waiting for the network request.

This is why systems like node.js can compete with multi-threaded systems. In fact in some cases can outperform multi-threaded systems because doing it on the same thread means you don't need locks (mutexes or semaphores) and you avoid the cost of context switching (when the OS puts one thread to sleep, copying all register values to RAM then wake up another thread copying register values back for the new process from RAM).

slebetman
  • 109,858
  • 19
  • 140
  • 171
  • 1
    The node will start processing all 1000 requests that reach to server and server configuration is 4 core CPU and 8 GB ram, do a cluster so we have only 4 node processes. At the time node can process 4 requests (R1, R2, R3, R4) for IO operation, and rest R5 to R1000 will be in the event queue. Once R1 will be completed, the next request from queue R5 will start and so on it will work. Is it correct or how the nodejs handle concurrent 1000 IO request per second? – Bhavin Aug 12 '20 at 08:29
  • 1
    No. Node will process all 1000 requests (R1, R2 .... R999, R1000) then wait for I/O operations. Depending on when the database/server send back the first response the result of R1 will be processed **AFTER** node processed the request R50 or R346 or R1000 - it does not at all depend on node. It depends on when the response arrives back to node. – slebetman Aug 12 '20 at 08:46
  • This means that once R1 is completed R54 may be completed or R800 or R3. Depends on which response the database returns back to node first, second, third etc. The one doing real multithreading work is the database. Node is just sitting three waiting doing nothing with an array (or linked lists) of 1000 callbacks waiting to be executed. – slebetman Aug 12 '20 at 08:48
  • How much time it will take. For example, each request completes their IO database operation in 1 second, then total time (number of requests * time ) (1000 * 1) = 1000 second. or it will process the 4 requests parallelly as default thread pool size is 4. so time it will be (number of requests * parallel thread process) (1000 / 4) = 250 second – Bhavin Aug 12 '20 at 11:25
  • That's up to your database. Not node.js. For example each request completes the operation in 1 second but your database can process 500 requests in parallel. Let's say each response takes 1ms to construct the reply and send back to clients. So database completes in 2 seconds + 1000 * 0.001 = 3 seconds on a single thread. On 4 parallel processes it still takes 2 seconds for database to complete and but responses can be constructed in parallel so 2 seconds + 250 * 0.001 = 2.25 seconds – slebetman Aug 12 '20 at 11:43
  • Each Request takes 1 Millisecond for database operation, the database can process 500 database queries parallelly. total 1000 request / 500 parallel = 2 milliseconds, to complete the 1000 request it will take only 2 millisecond. Here node will no take any time to prepare the response and send it, in case if node take 1 milisecond to prepare the response and send considering the single thread application, so the total would be 2 milisecond database + ( 1 milisecond per request * 1000 total request ) 1000 milisecond = 1002 milisecond, Is it correct? – Bhavin Aug 12 '20 at 12:14
  • @Bhavin Yes. Node's main limitation is anything you do in the callback - reformatting database rows to objects, building a JSON string from objects etc. These are generally so small that people normally ignore them but when you are really pushing the limits they do add up. This is when things like clustering can help. But when you are pushing your limits there are also other things to consider: PCI channels your motherboard have (for disk I/O parallelism), network bandwidth, internet bandwidth etc. – slebetman Aug 14 '20 at 04:23
  • This has been a very insightful read, thanks for the information, but i have an extra question => i have the same server specs setup as @Bhavin here and kinda of in the same situation, so while running the app on PM2 `fork mode`, it reaches 100% CPU usage, while the actual 4 CPU threads are not loaded at all, could this 100% mean the event loop and not the actual CPU thread, and how do i make sure that node is using the thread it has a hold off to the max ? – mohdule Apr 12 '21 at 20:35
  • 2
    @mohdule 100% CPU load is weird. Your node script should use close to 0% CPU. Even with heavy loads it should barely go above 10% for a normal server unless you're doing something like media transcoding or compressing/uncompressing a huge (100MB+) file. If it is stuck at 100% CPU usage then it means you have an infinite loop in your code (or a long running loop) – slebetman Apr 13 '21 at 02:34
  • 2
    @mohdule Here's a screenshot of the load of one of my live servers. This server runs the POS system as well as online ordering and fulfillment system for a supermarket. It runs two Java services and one Node service. As you can see the current CPU load around is around 1% to 2%. It's not exactly rush hour when I took this but there are customers at the checkout -- https://static.rcgroups.net/forums/attachments/2/6/7/7/4/4/a14861913-103-Screenshot%20at%202021-04-13%2010-36-26.png. – slebetman Apr 13 '21 at 02:43
  • After reading your comment i went and checked the code, and found something horrible, basically the code has a timeout and 10 second intervals that run upon each request (this endpoint gets hit multiple thousands of times per hour). i found that it contains some unnecessary SQL queries being made with each interval, so this is almost like an infinite loop with huge delay that adds up every 10 seconds, refactored it and now it's sitting at 20% CPU during peak hours (which is a massive improvement), i bet there are more stuff like this, thanks @slebetman – mohdule Apr 13 '21 at 12:06
  • @slebetman - As you mentioned node.js spawns four threads in addition to the main thread but none of them are used for network I/O such as database operations. use for DNS resolver, File system API, Crypto, Zlib, does it mean database operation will execute on the main thread only and the main thread will be blocked until DB operation is completed? If yes then it will directly impact performance as well. – Bhavin Apr 03 '22 at 05:28
  • @slebetman, what a great and concise explanation! I'll use it for my non-nodejs colleagues a lot! Thank you – Viktor Molokostov Oct 07 '22 at 09:42
3

Networking in nodejs does not use the threadpool so switching that will not affect your network I/O throughput. Networking uses OS APIs that are already asynchronous and non-blocking.

Your Javascript that runs when an incoming request is processed does not use the thread pool either.

Disk I/O does use the threadpool, but if you're only accessing one physical disk drive, then you may not benefit much from making the threadpool larger because there's only one physical disk servo that can only be in one place at a time anyway so running 20 disk requests in parallel doesn't necessarily help you if they are all competing for the same disk head positioning. In fact, it might even make things worse as the OS tries to timeslice between all the different threads causing the disk head to move around more than optimal to serve each thread.

To serve 1000 requests per second, you will have to benchmark and test and find out where your bottlenecks are. If I had to guess I'd bet the bottleneck would be your database in which case reconfiguring nodejs settings isn't where you need to concentrate your effort. But, in any case, only once you've determined where the bottleneck is in your particular application can you properly figure out what options might help you with that bottleneck. Also, keep in mind that serving 1000 requests/sec means you can't be running Javascript per request that takes more than 1ms for each request. So, you will probably have to cluster your server too (typically one cluster per physical CPU core in your server hardware). So, if you have an 8-core server, you would set up an 8-node cluster.

For example if you are CPU limited in your nodejs process with the running of your own Javascript, then perhaps you want to implement nodejs clustering to get multiple CPUs all running different requests. But, if the real bottleneck is in your database, then cluster your nodejs request handlers won't help with the database bottleneck.

Benchmark, measure, come up with theories based on the data for what to change, design specific tests to measure that, then implement one of the theories and measure. Adjust based on what you measure. You can only really do this properly (without wasting a lot of time in unproductive directions) by measuring first, making appropriate adjustments and then measuring progress.

jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • The question is to understand the event loop concerning 1000 requests, and how it handles. Let's assume we have a login API, for that we do a select query in the database, now I want to understand how nodejs handle this login API when there is 1000 request concurrently coming in one second. Thank you. – Bhavin Apr 03 '22 at 05:34
  • @Bhavin - This question and answer was from over 1-1/2 years ago and you've already accepted an answer. Please write a new question with whatever you think is now new in the question you want to ask. You can drop a few of us a comment and ask us to check out your new question if you want, but please put any new parts to your question in a new and separate question. – jfriend00 Apr 03 '22 at 05:46
  • @jfriend00 Networking uses OS APIs -> this means somehow OS API put the response from external API in callback and also pushes this callback into the event loop? – Rajat Aggarwal Jul 27 '22 at 16:45
  • 1
    @RajatAggarwal - Nodejs native code calls OS API. When OS API returns a result (whether synchronous or asynchronous), then the nodejs native code puts an event into the event queue for nodejs to process when it gets a turn in the event loop. Different kinds of OS API calls work differently as some are natively asynchronous (networking) and some are not (file I/O) and nodejs native code treats them differently. – jfriend00 Jul 27 '22 at 23:16
2

Network sockets in libuv are non blocking (i.e. not on that threadpool). Build a test harness first. You are more than likely fine with the defaults.

To increase the threadpool size, set the environment variable UV_THREADPOOL_SIZE=N up to 1024.

$ node --help
...
Environment variables:
...
UV_THREADPOOL_SIZE            sets the number of threads used in libuv's threadpool
Matt
  • 68,711
  • 7
  • 155
  • 158
  • UV_THREADPOOL_SIZE=N up to 1024, it is required high configured machine. I have 4 core, 8 GB RAM, Can we make UV_THREADPOOL_SIZE=1024 in a given configuration, or what configuration required to set it 1024? – Bhavin Aug 12 '20 at 06:02
  • You would normally run a node process per CPU thread rather than adjust thread pools. JS code is normally the bottleneck. – Matt Aug 12 '20 at 06:09