1

I'm banging my head against the wall now for 3 days and counting. I hope you can tell me what am I doing wrong.

I have a Redis cluster with 3 masters and 3 slaves. A NodeJS application wishes to use the redis to store and serve data quickly. All seemed fine, however, we experienced occasional drops in the application. Turns out, it frequently restarted itself. At the time of a restart, the following messages are logged by the application (not by redis)

2020-08-05 08:48 +02:00: events.js:288
2020-08-05 08:48 +02:00:       throw er; // Unhandled 'error' event
2020-08-05 08:48 +02:00:       ^
2020-08-05 08:48 +02:00: 
2020-08-05 08:48 +02:00: Error: connect EMFILE 10.0.0.201:7101 - Local (undefined:undefined)
2020-08-05 08:48 +02:00:     at internalConnect (net.js:917:16)
2020-08-05 08:48 +02:00:     at defaultTriggerAsyncIdScope (internal/async_hooks.js:311:12)
2020-08-05 08:48 +02:00:     at net.js:1007:9
2020-08-05 08:48 +02:00:     at processTicksAndRejections (internal/process/task_queues.js:79:11)
2020-08-05 08:48 +02:00:     at runNextTicks (internal/process/task_queues.js:66:3)
2020-08-05 08:48 +02:00:     at listOnTimeout (internal/timers.js:518:9)
2020-08-05 08:48 +02:00:     at processTimers (internal/timers.js:492:7)
2020-08-05 08:48 +02:00: Emitted 'error' event on Socket instance at:
2020-08-05 08:48 +02:00:     at emitErrorNT (internal/streams/destroy.js:92:8)
2020-08-05 08:48 +02:00:     at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
2020-08-05 08:48 +02:00:     at processTicksAndRejections (internal/process/task_queues.js:84:21)
2020-08-05 08:48 +02:00:     at runNextTicks (internal/process/task_queues.js:66:3)
2020-08-05 08:48 +02:00:     at listOnTimeout (internal/timers.js:518:9)
2020-08-05 08:48 +02:00:     at processTimers (internal/timers.js:492:7) {
2020-08-05 08:48 +02:00:   errno: 'EMFILE',
2020-08-05 08:48 +02:00:   code: 'EMFILE',
2020-08-05 08:48 +02:00:   syscall: 'connect',
2020-08-05 08:48 +02:00:   address: '10.0.0.201',
2020-08-05 08:48 +02:00:   port: 7101
2020-08-05 08:48 +02:00: }
2020-08-05 08:48 +02:00: events.js:288
2020-08-05 08:48 +02:00:       throw er; // Unhandled 'error' event
2020-08-05 08:48 +02:00:       ^
2020-08-05 08:48 +02:00: 
2020-08-05 08:48 +02:00: Error [ERR_IPC_CHANNEL_CLOSED]: Channel closed
2020-08-05 08:48 +02:00:     at ChildProcess.target.send (internal/child_process.js:679:16)
2020-08-05 08:48 +02:00:     at Worker.send (internal/cluster/worker.js:45:28)
2020-08-05 08:48 +02:00:     at Server.<anonymous> (/totally/secret/folder/structure/app.js:136:16)
2020-08-05 08:48 +02:00:     at Server.emit (events.js:311:20)
2020-08-05 08:48 +02:00:     at TCP.onconnection (net.js:1554:8)
2020-08-05 08:48 +02:00: Emitted 'error' event on Worker instance at:
2020-08-05 08:48 +02:00:     at ChildProcess.<anonymous> (internal/cluster/worker.js:29:12)
2020-08-05 08:48 +02:00:     at ChildProcess.emit (events.js:311:20)
2020-08-05 08:48 +02:00:     at internal/child_process.js:683:35
2020-08-05 08:48 +02:00:     at processTicksAndRejections (internal/process/task_queues.js:79:11) {
2020-08-05 08:48 +02:00:   code: 'ERR_IPC_CHANNEL_CLOSED'
2020-08-05 08:48 +02:00: }
2020-08-05 08:48 +02:00: WARNING: NODE_APP_INSTANCE value of '0' did not match any instance config file names.
2020-08-05 08:48 +02:00: WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
2020-08-05 08:48 +02:00: WARNING: NODE_APP_INSTANCE value of '0' did not match any instance config file names.
2020-08-05 08:48 +02:00: WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
2020-08-05 08:48 +02:00: WARNING: NODE_APP_INSTANCE value of '0' did not match any instance config file names.
2020-08-05 08:48 +02:00: WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
2020-08-05 08:48 +02:00: WARNING: NODE_APP_INSTANCE value of '0' did not match any instance config file names.
2020-08-05 08:48 +02:00: WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
2020-08-05 08:48 +02:00: WARNING: NODE_APP_INSTANCE value of '0' did not match any instance config file names.
2020-08-05 08:48 +02:00: WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
2020-08-05 08:48 +02:00: (node:28699) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'emit' of undefined
2020-08-05 08:48 +02:00:     at /totally/secret/folder/structure/app.js:323:34
2020-08-05 08:48 +02:00:     at runMicrotasks (<anonymous>)
2020-08-05 08:48 +02:00:     at processTicksAndRejections (internal/process/task_queues.js:97:5)
2020-08-05 08:48 +02:00: (node:28699) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 530)
2020-08-05 08:48 +02:00: (node:28699) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
2020-08-05 08:48 +02:00: (node:28699) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 201 error listeners added to [Cluster]. Use emitter.setMaxListeners() to increase limit

What I've tried so far is to disable either AOF or RDB but with no success, the occasional restarts still happen. What else could be the cause of this issue? The developers say the application simply cannot connect to Redis, thus failing. But that is impossible. Pinging the server and watching the uptime of the service confirms 100%, that the server itself is always reachable and available for the application. All 6 of them.

Michael Hampton
  • 244,070
  • 43
  • 506
  • 972
Bert
  • 1,028
  • 1
  • 16
  • 33
  • How do you start the Node.js app? – Michael Hampton Aug 05 '20 at 09:59
  • in pm2 with an `echosystem.config.js` file. But that just executes (and keep restarting) the following command: `node app.js --time` – Bert Aug 05 '20 at 11:19
  • Eh? Why pm2 and not systemd? Anyway, EMFILE means "Too many open files" which in the context of a network connection suggests that it has opened too many connections already. Is something in your code opening hundreds or thousands of connections (possibly by mistake)? – Michael Hampton Aug 05 '20 at 11:34
  • It is opening hundreds and possibly thousands of connections indeed. However, that is by design. Not fault. The goal is to monitor Hardware information, refreshed every second on a web application. – Bert Aug 05 '20 at 11:41

1 Answers1

1

The error EMFILE is "Too many open files". In the conext of opening a network connection, a file descriptor is required, so since you are opening thousands of simultaneous connections, you are running into the open files ulimit, which by default is 1024.

You can raise this by setting MAX_OPEN_FILES= in /etc/default/pm2, and restarting pm2 and your app.

MAX_OPEN_FILES=8192
Michael Hampton
  • 244,070
  • 43
  • 506
  • 972
  • So like: `~/.nvm/versions/node/v12.16.1/bin/pm2` for me? Also, the ulimit has been raised to 65535 `ulimit -a` -> `open files (-n) 65535` – Bert Aug 05 '20 at 12:01
  • The path is correct unless you or someone else altered pm2 to look elsewhere. And how did you determine that the ulimit was raised? That looks like you just ran `ulimit -a` from your shell. – Michael Hampton Aug 05 '20 at 13:13
  • Raised with: `vim /etc/security/limits.conf` -> `web soft nofile 65536 web hard nofile 100000 ` Imagine a breakline after 65536, so these are in 2 rows. Web is the user who actually runs the node application with pm2. – Bert Aug 05 '20 at 13:16