4

Context:

I have some code that looks roughly like this:

const express = require("express");
const app = express();
const fetch = require('node-fetch');

app.get("/file/:path", async function(request, response) {
  const path = request.params.path;
  
  const reqController = new AbortController();
  const reqTimeout = setTimeout(() => reqController.abort(), 10000);

  const r = await fetch(`https://cdn.example.com/${encodeURIComponent(path)}`, {
    timeout:10000,
    signal: reqController.signal,
  }).catch(e => false).finally(() => clearTimeout(reqTimeout));

  if(r === false || !r.ok) {
    response.send("error");
  } else {
    r.body.pipe(response);
  } 
});

...

So you can see that I'm simply fetching a response with node-fetch and then piping it the the client response stream.

Note that I'm handling the errors better than .catch(e => false) in the real code. It's extremely rare that there are errors (no errors in several months), so I figured I'd leave out those details.

Question:

What are the possible ways in which this code could leak TCP connections or memory? How do I "cover" these cases?

Extra Info:

I'd have thought that by default node-fetch/express would have default timeouts for streaming (based on time since last chunk received) that would prevent leaks, but that doesn't seem to be the case.

I've tried adding code like this:

r.body.on('error', (error) => { 
  response.connection.destroy();
});
response.on('error', (error) => {
  r.body.cancel();
});

r.body.pipe(response);

I don't have a good understanding of the internals of node-fetch and Node.js's streams, so this was just a guess at something that might cover some sort of unusual failure mode, but it turns out that this doesn't fix the problem.

Note that I'm using the timeout option of node-fetch (it's special extension that's not available in the browser fetch). But I've just edited the first code block to show that I'm also using an abort signal in the real/unsimplified code. I originally added abort signals because I thought perhaps there was a bug with the timeout option, but I'm still seeing the connection leak. Looking into it a bit more now I see that the timeout option resets the timeout on a redirect, so perhaps that is one interesting cause of connection leaks (keeps redirecting infinitely? but I'd guess there's a redirect limit anyway), but alas it's not the problem I'm having here because I've also got the abort signal. I'm using node-fetch v2.6.1 and Node.js v14.7.0.

It might be also helpful for the pros here to know that this is what I'm seeing with netstat -an and cat /proc/net/sockstat a few minutes after an app/server restart:

{
  LISTEN: 37,
  TIME_WAIT: 4644,
  ESTABLISHED: 268,
  CLOSE_WAIT: 1,
  SYN_SENT: 1,
  SYN_RECV: 4,
  FIN_WAIT1: 3
}
sockets: used 801
TCP: inuse 94 orphan 4 tw 4642 alloc 312 mem 5305
UDP: inuse 2 mem 1
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

And after several hours you can see the TCP mem value in /proc/net/sockstat has climbed considerably:

{
  LISTEN: 37,
  TIME_WAIT: 4151,
  CLOSE_WAIT: 37,
  ESTABLISHED: 337,
  LAST_ACK: 1,
  SYN_SENT: 1,
  SYN_RECV: 4
}
sockets: used 897
TCP: inuse 171 orphan 0 tw 4151 alloc 413 mem 63487
UDP: inuse 2 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

After a few days the mem grows to around 600k which is close to the maximum/critical threshold set that I see in /proc/sys/net/ipv4/tcp_mem. The tcp inuse value gets to around 5k. The server handles about a dozen requests per second, so tiny/rare leaks can become significant in a few days.

I've tried manually triggering garbage collection every few minutes in case it's due to weird/bad GC scheduling (just a wild guess), but that didn't help.

joe
  • 3,752
  • 1
  • 32
  • 41
  • You seem to be mixing `await fetch()` with `fetch().catch()`. In async functions you use `try {} catch (err){}` to handle situations where functions like `fetch()` fail. It's possible you have failures, but you're not logging them. – O. Jones Aug 19 '21 at 11:10
  • @O.Jones It would be really surprising to me if `.catch(e => ...)` approach was functionally different to the `try{}catch(e){}` here! I should have noted that in the real code I am logging errors (in fact I've set up a monitoring service to send me an email in case of errors - they're really rare). I'll edit the question now to specify this. – joe Aug 19 '21 at 12:21
  • With your extra info `Streams` should handle syncing between source and destination seamlessly; but if you prefer to end the connection after some time if it is still in progress, then you can use **Request cancellation with AbortSignal** as mentioned on node-fetch GH page. Ending response also would be a call to `response.end()` too. – Javad M. Amiri Aug 24 '21 at 20:04
  • @JavadM.Amiri I'm using `node-fetch`'s timeout option (as shown in my code example), but I also actually have an abort signal set up as well in the real code (I originally added it because I thought there might be a bug with the `timeout` option). I'll add this info to my post. – joe Aug 28 '21 at 00:11

1 Answers1

0

Consider this article to see how to abort a fetch request: https://davidwalsh.name/cancel-fetch

You can set a timeout to abort your fetch call:

fetch(`https://cdn.example.com/${encodeURIComponent(path)}`, { signal }).then(response => {
    console.log(`Request is complete!`);
}).catch(e => {
    if(e.name === "AbortError") {
        // We know it's been canceled!
        console.warn(`Fetch aborted: ${e.message}`);
    }
    else{
      console.warn(`Some other fetch error: ${e.message}`);
    }
});

// Wait 2 seconds to abort the request
setTimeout(() => controller.abort(), 2000);
  • `node-fetch` comes with a `timeout` option which (as shown in original code) I've set to 10000ms. In the actual code I actually have an abort signal *and* the `node-fetch` timeout (since I originally figured that there could be bugs with the `timeout` option), but I'm still seeing tcp connection leaks so I concluded that there was a deeper issue here. I'll add these details to my question post. – joe Aug 28 '21 at 00:08
  • to check if there are any TCP connection leaks I suggest to use Wireshark since it gives you a more comprehensive view of the network activity. Once you know what is the real issue behind this you can handle it in code. https://www.wireshark.org/download.html – Ilyes El Benna Aug 28 '21 at 00:27