0

I have a really strange one I just cannot work out.

I have been building node/express apps for years now and usually run a dev server just at home for quick debugging/testing. I frontend it with a haproxy instance to make it "production like" and to perform the ssl part.

In any case, just recently ALL servers (different projects) started mis-behaving and stopped responding to requests around exactly 5 minutes after being started. That is ALL the 3 or 4 I run sometimes on this machine, yet the exact same instance of haproxy is front-ending the production version of the code and that has no issues, it's still rock solid. And, infuriatingly, I wrote a really basic express server example and if it's front ended by the same haproxy it also locks up, but if I switch ports, it runs fine forever as expected!

So in summary:

1x haproxy instance frontending a bunch of prod/dev instances with the same rule sets, all with ssl. 2x production instances working fine 4x dev instances(and a simple test program) ALL locking up after around 5 min when behind haproxy and if I run the simple test program on a different port so it's local network only, it works perfectly.

I do also have uptime robot liveness checks hitting the haproxy as well to monitor the instances.

So this example:

const express = require('express')
const request = require('request');
const app = express()
const port = 1234

var counter = 0;
var received = 0;

process.on('warning', e => console.warn(e.stack));

const started = new Date();

if (process.pid) {
    console.log('Starting as pid ' + process.pid);
}

app.get('/', (req, res) => {
  res.send('Hello World!').end();
})

app.get('/livenessCheck', (req, res) => {
  res.send('ok').end();
})

app.use((req, res, next) => {
  console.log('unknown', { host: req.headers.host, url: req.url });
  res.send('ok').end();
})

const server = app.listen(port, () => {
  console.log(`Example app listening on port ${port}`)
})

app.keepAliveTimeout = (5 * 1000) + 1000;
app.headersTimeout = (6 * 1000) + 2000;

setInterval(() => {
  server.getConnections(function(error, count) {
    console.log('connections', count);
  });
        //console.log('tick', new Date())
}, 500);

setInterval(() => {
  console.log('request', new Date())
request('http://localhost:' + port, function (error, response, body) {
  if (error) {
        const ended = new Date();
        console.error('request error:', ended, error); // Print the error if one occurred
        counter = counter - 1;
        if (counter < 0) {
                console.error('started ', started); // Print the error if one occurred
                const diff = Math.floor((ended - started) / 1000)
                const min = Math.floor(diff / 60);
                console.error('elapsed ', min, 'min ', diff - min*60, 'sec');
                exit;
        }
        return;
  }
  received = received + 1;
  console.log('request ', received, 'statusCode:', new Date(), response && response.statusCode); // Print the response status code if a response was received
  //console.log('body:', body); // Print the HTML for the Google homepage.
});

}, 1000);

works perfectly and runs forever on a non-haproxy port, but only runs for approx 5 min on a port behind haproxy, it usually gets to 277 request responses each time before hanging up and timing out.

The "exit()" function is just a forced crash for testing.

I've tried adjusting some timeouts on haproxy, but to no avail. And each one has no impact on the production instances that just keep working fine.

I'm running these dev versions on a mac pro 2013 with latest OS. and tried various versions of node.

Any thoughts what it could be or how to debug further?

oh, and they all server web sockets as well as http requests.

Here is one example of a haproxy config that I am trying (relevant sections):

global

    log         127.0.0.1 local2

    ...
    
    nbproc  1
    daemon

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull

    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          4s
    timeout server          5s
    timeout http-keep-alive 4s
    timeout check           4s
    timeout tunnel          1h
    maxconn                 3000

frontend wwws
    bind *:443 ssl crt /etc/haproxy/certs/ no-sslv3

    option http-server-close
    option forwardfor
    reqadd X-Forwarded-Proto:\ https
    reqadd X-Forwarded-Port:\ 443

    http-request set-header X-Client-IP %[src]

    # set HTTP Strict Transport Security (HTST) header
    rspadd  Strict-Transport-Security:\ max-age=15768000

    acl host_working hdr_beg(host) -i working.

    use_backend Working if host_working

    default_backend BrokenOnMac

backend Working
    balance     roundrobin
    server      working_1 1.2.3.4:8456 check

backend BrokenOnMac
    balance     roundrobin
    server      broken_1 2.3.4.5:8456 check


So if you go to https://working.blahblah.blah it works forever, but the backend for https://broken.blahblah.blah locks up and stops responding after 5 minutes (including direct curl requests bypassing haproxy).

BUT if I run the EXACT same code on a different port, it responds forever to any direct curl request.

The "production" servers that are working are on various OSes like Centos. On my Mac Pro, I run the tests. The test code works on the Mac on a port NOT front-ended by haproxy. The same test code hangs up after 5 minutes on the Mac when it has haproxy in front.

So the precise configuration that fails is: Mac Pro + any node express app + frontended by haproxy.

If I change anything, like run the code on Centos or make sure there is no haproxy, then the code works perfectly.

So given it only stopped working recently, then is it the latest patch for OSX Monterey (12.6) maybe somehow interfering with the app socket when it gets a certain condition from haproxy? Seems highly unlikely, but the most logical explanation I can come up with.

Macinspak
  • 1
  • 1
  • For starters, this `res.send(xxx).end();` should just be `res.send(xxx);` because `res.send()` in Express already calls `.end()` for you. – jfriend00 Oct 19 '22 at 12:29
  • What does your HAProxy config looks like? What does the logs of haproxy say? If you node app hangs, can you perform a curl request to? – Marc Oct 19 '22 at 13:19
  • Thanks @jfriend00, however I doubt that would cause the issue. I was also aware of that, but had added it as part of debugging to see if somehow a bug in express was keeping the session open. – Macinspak Oct 20 '22 at 05:57
  • Thanks @Marc. So I have tried various configurations for haproxy. I can share a couple of them, but they were quite different. I am sort of thinking it might be related to timeouts as I HAVE seen one configuration with different timeouts reduces the alive time down to around 2 minutes, but it seems odd it doesn't auto-recover, it's like the issue locks up a connection or something and it never times out. when it fails, it stops responding to any attempt to connect, including curl – Macinspak Oct 20 '22 at 05:59
  • As you can see in the example, it creates an express server, then every second it tries to hit the server (forever). This works when on a port that is not front-ended by haproxy, but when active behind haproxy and uptime robot, it hangs like clockwork at around 5 min. then nothing can communicate with it. I tried a heap dump with node inspect, and it was certainly bigger, but not huge - maybe related to a leak of some kind? but printing open sessions every .5 second mostly shows 0, sometimes 1. So it doesn't seem to be leaking connections. – Macinspak Oct 20 '22 at 06:04
  • @Macinspak Now i understand what you said and your example with the request inside your node code. Can you provide the HAProxy config? I want to try to recreate the issue. Have you tried a other OS than OSX, like CentOS/Rocky? – Marc Oct 20 '22 at 06:49
  • @Marc thanks again. I've added one of the haproxy configs (important sections only) to the question. And detail on the OSes. – Macinspak Oct 20 '22 at 22:44

0 Answers0