Context:
I have some code that looks roughly like this:
const express = require("express");
const app = express();
const fetch = require('node-fetch');
app.get("/file/:path", async function(request, response) {
const path = request.params.path;
const reqController = new AbortController();
const reqTimeout = setTimeout(() => reqController.abort(), 10000);
const r = await fetch(`https://cdn.example.com/${encodeURIComponent(path)}`, {
timeout:10000,
signal: reqController.signal,
}).catch(e => false).finally(() => clearTimeout(reqTimeout));
if(r === false || !r.ok) {
response.send("error");
} else {
r.body.pipe(response);
}
});
...
So you can see that I'm simply fetching a response with node-fetch
and then piping it the the client response stream.
Note that I'm handling the errors better than .catch(e => false)
in the real code. It's extremely rare that there are errors (no errors in several months), so I figured I'd leave out those details.
Question:
What are the possible ways in which this code could leak TCP connections or memory? How do I "cover" these cases?
Extra Info:
I'd have thought that by default node-fetch
/express
would have default timeouts for streaming (based on time since last chunk received) that would prevent leaks, but that doesn't seem to be the case.
I've tried adding code like this:
r.body.on('error', (error) => {
response.connection.destroy();
});
response.on('error', (error) => {
r.body.cancel();
});
r.body.pipe(response);
I don't have a good understanding of the internals of node-fetch
and Node.js's streams, so this was just a guess at something that might cover some sort of unusual failure mode, but it turns out that this doesn't fix the problem.
Note that I'm using the timeout
option of node-fetch
(it's special extension that's not available in the browser fetch
). But I've just edited the first code block to show that I'm also using an abort signal in the real/unsimplified code. I originally added abort signals because I thought perhaps there was a bug with the timeout
option, but I'm still seeing the connection leak. Looking into it a bit more now I see that the timeout
option resets the timeout on a redirect, so perhaps that is one interesting cause of connection leaks (keeps redirecting infinitely? but I'd guess there's a redirect limit anyway), but alas it's not the problem I'm having here because I've also got the abort signal. I'm using node-fetch v2.6.1
and Node.js v14.7.0
.
It might be also helpful for the pros here to know that this is what I'm seeing with netstat -an
and cat /proc/net/sockstat
a few minutes after an app/server restart:
{
LISTEN: 37,
TIME_WAIT: 4644,
ESTABLISHED: 268,
CLOSE_WAIT: 1,
SYN_SENT: 1,
SYN_RECV: 4,
FIN_WAIT1: 3
}
sockets: used 801
TCP: inuse 94 orphan 4 tw 4642 alloc 312 mem 5305
UDP: inuse 2 mem 1
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
And after several hours you can see the TCP mem
value in /proc/net/sockstat
has climbed considerably:
{
LISTEN: 37,
TIME_WAIT: 4151,
CLOSE_WAIT: 37,
ESTABLISHED: 337,
LAST_ACK: 1,
SYN_SENT: 1,
SYN_RECV: 4
}
sockets: used 897
TCP: inuse 171 orphan 0 tw 4151 alloc 413 mem 63487
UDP: inuse 2 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
After a few days the mem
grows to around 600k which is close to the maximum/critical threshold set that I see in /proc/sys/net/ipv4/tcp_mem
. The tcp inuse
value gets to around 5k. The server handles about a dozen requests per second, so tiny/rare leaks can become significant in a few days.
I've tried manually triggering garbage collection every few minutes in case it's due to weird/bad GC scheduling (just a wild guess), but that didn't help.