2

Express.js serving a Remix app. The server-side code sets several timers at startup that do various background jobs every so often, one of which checks if a remote Jenkins build is finished. If so, it copies several large PDFs from one network path to another network path (both on GSA).

One function creates an array of chained glob+copyFile promises:

  import { copyFile } from 'node:fs/promises';
  import { promisify } from "util";
  import glob from "glob";
...
  async function getFiles() {
            let result: Promise<void>[] = [];
            let globPromise = promisify(glob);
            for (let wildcard of wildcards) { // lots of file wildcards here
              result.push(globPromise(wildcard).then(
                (files: string[]) => {
                    if (files.length < 1) {
                        // do error stuff
                    } else {
                        for (let srcFile of files) {
                            let tgtFile = tgtDir + basename(srcFile);
                            return copyFile(srcFile, tgtFile);
                        }
                    }
                },
                (reason: any) => {
                    // do error stuff
                }));
            }
            return result;
 }

Another async function gets that array and does Promise.allSettled on it:

copyPromises = await getFiles();
console.log("CALLING ALLSETTLED.THEN()...");
return Promise.allSettled(copyPromises).then(
    (results) => {
        console.log("ALLSETTLED COMPLETE...");

Between the "CALLING" and "COMPLETE" messages, which can take on the order of several minutes, the server no longer responds to browser requests, which timeout.

However, during this time my other active backend timers can still be seen running and completing just fine in the server console log (I made one run every 5 seconds for test purposes, and it runs quite smoothly over and over while those file copies are crawling along).

So it's not blocking the server as a whole, it's seemingly just preventing browser requests from being handled. And once the "COMPLETE" message pops up in the log, browser requests are served up normally again.

The Express startup script basically just does this for Remix:

const { createRequestHandler } = require("@remix-run/express");
...
app.all(
    "*",
        createRequestHandler({
            build: require(BUILD_DIR),
            mode: process.env.NODE_ENV,
        })
);

What's going on here, and how do I solve this?

  • 1
    I would use `child-process` to run the task in another thread – Konrad Oct 08 '22 at 20:10
  • Wow, bizarre! fs.copyFile(srcFile, tgtFile) hoses up the server to HTTP requests, but using child_process.exec("copy " + srcFile + " " + tgtFile) doesn't...at all. The browser requests are handled instantly while it's chugging on all those copies! The latter is OS-dependent, but I can certainly live with that, given how simply (and well) it takes care of the issue. What I still don't understand is...given that Node is reportedly "very good at asynchronous I/O", why does async copyFile effectively block the server? – Ernest Crvich Oct 08 '22 at 21:04
  • Nude is running in one thread. It's good for multiple short tasks. If some operation takes a lot of time it will clog. – Konrad Oct 08 '22 at 22:05
  • I don't know remix, what does `createRequestHandler` do? Does it try to serve files from the file system? – Bergi Oct 09 '22 at 03:39
  • "*it copies several large PDFs*" - how many files are we talking about here? – Bergi Oct 09 '22 at 03:43
  • I don't know the implementation details of createRequestHandler, but like most web servers, yes, it serves files from the local file system. Remix is akin to NextJS, a server-render React-based JS framework. Number of PDFs varies based on what got built in Jenkins, but obviously the more I copy, the longer the server is hosed. I mean, I have a solution (which is to do the copy with the OS instead of a JS function), but I still don't understand why that solves the problem. I do realize Node is single-threaded, but fs/promises.copyFile is asynchronous I/O, which Node reportedly excels at. – Ernest Crvich Oct 10 '22 at 15:29
  • I guess you might be running into some OS-imposed limit on open file handles per process, or even the IO limit of your file system, that would impact the web server reading files to serve. Try changing `getFiles` to copy the files sequentially, instead of all at once. – Bergi Oct 10 '22 at 15:50
  • Even if I limit the test case to, say, 5 files, it still has the same request-blocking effect. But Konrad's very first suggestion works best anyway (spawning an OS copy instead of using Node's async copyFile). Just wish I knew why async I/O (A) prevents requests from being handled, but (B) doesn't prevent other server-side timer-scheduled jobs from running just fine. – Ernest Crvich Oct 10 '22 at 17:53
  • Hm, that's unusal – Bergi Oct 10 '22 at 18:24

2 Answers2

1

It's apparent no further discussion is forthcoming, and I've not determined why the async I/O functions are preventing server responses, so I'll go ahead and post an answer that was basically Konrad Linkowski's workaround solution from the comments: to use the OS to do the copies instead of using copyFile(). It boils down to this in place of the glob+copyFile calls inside getFiles:

const exec = util.promisify(require('node:child_process').exec);
...
async function getFiles() {
   ...
   result.push( exec("copy /y " + wildcard + " " + tgtDir) );
   ...
}

This does not exhibit any of the request-crippling behavior; for the entire time the copies are chugging away (many minutes), browser requests are handled instantly.

It's an OS-specific solution and thus non-portable as-is, but that's fine in our case since we will likely be using a Windows server for this app for many years to come. And certainly if needed, runtime OS-detection could be used to make the commands run on other OSes.

0

I guess that this is due to node's libuv using a threadpool with synchronous access for file system operations, and the pool size is only 4. See https://kariera.future-processing.pl/blog/on-problems-with-threads-in-node-js/ for a demonstration of the problem, or Nodejs - What is maximum thread that can run same time as thread pool size is four? for an explanation of how this is normally not a problem in network-heavy applications.

So if you have a filesystem-access-heavy application, try increasing the thread pool by setting the UV_THREADPOOL_SIZE environment variable.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • Interesting, thanks for that, may at least explain what's going on. But I keep seeing warnings not to set that var to more than the number of CPU cores you have. In my machine's instance, that's only 6, which isn't going to do any good when the build request built, say, 30 PDFs. Using the OS to do the copies doesn't have that limitation. – Ernest Crvich Oct 19 '22 at 00:17