1

Disclaimer: I'm not experienced with programming or with networks in general so I might be missing something quite obvious.

So i'm making a function in node.js that should go over an array of image links from my database and check if they're still working. There's thousands of links to check so I can't just fire off several thousand fetch calls at once and wait for results, instead I'm staggering the requests, going 10 by 10 and doing head requests to minimize the bandwidth usage.

I have two issues.

The first one is that after fetching the first 10-20 links quickly, the other requests take quite a bit longer and 9 or 10 out of 10 of them will time out. This might be due to some sort of network mechanism that throttles my requests when there are many being fired at once, but I'm thinking it's likely due to my second issue.

The second issue is that the checking process slows down after a few iterations. Here's an outline of what I'm doing. I'm taking the string array of image links and slicing it 10 by 10 then I check those 10 posts in 10 promises: (ignore the i and j variables, they're there just to track the individual promises and timeouts for loging/debugging)

const partialResult = await Promise.all(postsToCheck.map(async (post, j) => await this.checkPostForBrokenLink(post, i + j)));

within checkPostForBrokenLink I have a race between the fetch and a timeout of 10 seconds because I don't want to have to wait for the connection to time out every time timing out is a problem, I give it 10 seconds and then flag it as having timed out and move on.

const timeoutPromise = index => {
    let timeoutRef;
    const promise = new Promise<null>((resolve, reject) => {
        const start = new Date().getTime();
        console.log('===TIMEOUT INIT===' + index);
        timeoutRef = setTimeout(() => {
            const end = new Date().getTime();
            console.log('===TIMEOUT FIRE===' + index, end - start);
            resolve(null);
        }, 10 * 1000);
    });
    return { timeoutRef, promise, index };
};
const fetchAndCancelTimeout = timeout => {
    return fetch(post.fileUrl, { method: 'HEAD' })
        .then(result => {
            return result;
        })
        .finally(() => {
            console.log('===CLEAR===' + index); //index is from the parent function
            clearTimeout(timeout);
        });
};
const timeout = timeoutPromise(index);
const videoTest = await Promise.race([fetchAndCancelTimeout(timeout.timeoutRef), timeout.promise]);

if fetchAndCancelTimeout finishes before timeout.promise does, it will cancel that timeout, but if the timeout finishes first the promise is still "resolving" in the background, despite the code having moved on. I'm guessing this is why my code is slowing down. The later timeouts take 20-30 seconds from being set up to firing, despite being set to 10 seconds. As far as I know, this has to be because the main process is busy and doesn't have time to execute the event queue, though I don't really know what it could be doing except waiting for the promises to resolve.

So the question is, first off, am I doing something stupid here that I shouldn't be doing and that's causing everything to be slow? Secondly, if not, can I somehow manually stop the execution of the fetch promise if the timeout fires first so as not to waste resources on a pointless process? Lastly, is there a better way to check if a large number of links are valid that what I'm doing here?

Supperhero
  • 911
  • 1
  • 7
  • 24
  • Resolves only when not timedout: https://stackoverflow.com/questions/46946380/fetch-api-request-timeout – Estradiaz Jan 26 '20 at 16:24
  • It is not possible as far as I know. Some time ago I had a similar problem with the HttpClient. My solution was to subscribe to the HTTP request and if the subsription is getting unsubscribed the request gets cancelled. – Sebastian S. Jan 26 '20 at 16:33
  • 2
    Have you looked at [AbortSignal](https://developer.mozilla.org/en-US/docs/Web/API/AbortSignal)? It is still experimental technology, so I'm not sure that is is available in node.js. – some Jan 26 '20 at 20:34
  • What are you using for `fetch` in node.js? Did you have a look at its documentation regarding request cancellation? – Bergi Jan 26 '20 at 21:50
  • Can you please show the complete code that does the "staggering"? Maybe there's some mistake. – Bergi Jan 26 '20 at 21:52

2 Answers2

0

I found the problem and it wasn't, at least not directly, related to promise buildup. The code shown was for checking video links but, for images, the fetch call was done by a plugin and that plugin was causing the slowdown. When I started using the same code for both videos and images, the process suddenly became orders of magnitude quicker. I didn't think to check the plugin at first because it was supposed to only do a head request and format the results which shouldn't be an issue.

For anyone looking at this trying to find a way to cancel a fetch, @some provided an idea that seems like it might work. Check out https://www.npmjs.com/package/node-fetch#request-cancellation-with-abortsignal

Supperhero
  • 911
  • 1
  • 7
  • 24
0

Something you might want to investigate here is the Bluebird Promise library. There are two functions in particular that I believe could simplify your implementation regarding rate limiting your requests and handling timeouts.

Bluebird Promise.map has a concurrency option (link), which allows you to set the number of concurrent requests and it also has a Promise.timeout function (link) which will return a rejection of the promise if a certain timeout has occurred.

jeeves
  • 1,871
  • 9
  • 25