I have a really strange one I just cannot work out.
I have been building node/express apps for years now and usually run a dev server just at home for quick debugging/testing. I frontend it with a haproxy instance to make it "production like" and to perform the ssl part.
In any case, just recently ALL servers (different projects) started mis-behaving and stopped responding to requests around exactly 5 minutes after being started. That is ALL the 3 or 4 I run sometimes on this machine, yet the exact same instance of haproxy is front-ending the production version of the code and that has no issues, it's still rock solid. And, infuriatingly, I wrote a really basic express server example and if it's front ended by the same haproxy it also locks up, but if I switch ports, it runs fine forever as expected!
So in summary:
1x haproxy instance frontending a bunch of prod/dev instances with the same rule sets, all with ssl. 2x production instances working fine 4x dev instances(and a simple test program) ALL locking up after around 5 min when behind haproxy and if I run the simple test program on a different port so it's local network only, it works perfectly.
I do also have uptime robot liveness checks hitting the haproxy as well to monitor the instances.
So this example:
const express = require('express')
const request = require('request');
const app = express()
const port = 1234
var counter = 0;
var received = 0;
process.on('warning', e => console.warn(e.stack));
const started = new Date();
if (process.pid) {
console.log('Starting as pid ' + process.pid);
}
app.get('/', (req, res) => {
res.send('Hello World!').end();
})
app.get('/livenessCheck', (req, res) => {
res.send('ok').end();
})
app.use((req, res, next) => {
console.log('unknown', { host: req.headers.host, url: req.url });
res.send('ok').end();
})
const server = app.listen(port, () => {
console.log(`Example app listening on port ${port}`)
})
app.keepAliveTimeout = (5 * 1000) + 1000;
app.headersTimeout = (6 * 1000) + 2000;
setInterval(() => {
server.getConnections(function(error, count) {
console.log('connections', count);
});
//console.log('tick', new Date())
}, 500);
setInterval(() => {
console.log('request', new Date())
request('http://localhost:' + port, function (error, response, body) {
if (error) {
const ended = new Date();
console.error('request error:', ended, error); // Print the error if one occurred
counter = counter - 1;
if (counter < 0) {
console.error('started ', started); // Print the error if one occurred
const diff = Math.floor((ended - started) / 1000)
const min = Math.floor(diff / 60);
console.error('elapsed ', min, 'min ', diff - min*60, 'sec');
exit;
}
return;
}
received = received + 1;
console.log('request ', received, 'statusCode:', new Date(), response && response.statusCode); // Print the response status code if a response was received
//console.log('body:', body); // Print the HTML for the Google homepage.
});
}, 1000);
works perfectly and runs forever on a non-haproxy port, but only runs for approx 5 min on a port behind haproxy, it usually gets to 277 request responses each time before hanging up and timing out.
The "exit()" function is just a forced crash for testing.
I've tried adjusting some timeouts on haproxy, but to no avail. And each one has no impact on the production instances that just keep working fine.
I'm running these dev versions on a mac pro 2013 with latest OS. and tried various versions of node.
Any thoughts what it could be or how to debug further?
oh, and they all server web sockets as well as http requests.
Here is one example of a haproxy config that I am trying (relevant sections):
global
log 127.0.0.1 local2
...
nbproc 1
daemon
defaults
mode http
log global
option httplog
option dontlognull
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 4s
timeout server 5s
timeout http-keep-alive 4s
timeout check 4s
timeout tunnel 1h
maxconn 3000
frontend wwws
bind *:443 ssl crt /etc/haproxy/certs/ no-sslv3
option http-server-close
option forwardfor
reqadd X-Forwarded-Proto:\ https
reqadd X-Forwarded-Port:\ 443
http-request set-header X-Client-IP %[src]
# set HTTP Strict Transport Security (HTST) header
rspadd Strict-Transport-Security:\ max-age=15768000
acl host_working hdr_beg(host) -i working.
use_backend Working if host_working
default_backend BrokenOnMac
backend Working
balance roundrobin
server working_1 1.2.3.4:8456 check
backend BrokenOnMac
balance roundrobin
server broken_1 2.3.4.5:8456 check
So if you go to https://working.blahblah.blah it works forever, but the backend for https://broken.blahblah.blah locks up and stops responding after 5 minutes (including direct curl requests bypassing haproxy).
BUT if I run the EXACT same code on a different port, it responds forever to any direct curl request.
The "production" servers that are working are on various OSes like Centos. On my Mac Pro, I run the tests. The test code works on the Mac on a port NOT front-ended by haproxy. The same test code hangs up after 5 minutes on the Mac when it has haproxy in front.
So the precise configuration that fails is: Mac Pro + any node express app + frontended by haproxy.
If I change anything, like run the code on Centos or make sure there is no haproxy, then the code works perfectly.
So given it only stopped working recently, then is it the latest patch for OSX Monterey (12.6) maybe somehow interfering with the app socket when it gets a certain condition from haproxy? Seems highly unlikely, but the most logical explanation I can come up with.