0

When I use PM2 to run multiple processes (i.e. cluster mode) and one of those processes encounters an uncaught error, PM2 does not restart that process.
Why?
How do I make it restart workers in cluster mode?

Example code

// index.js
let counter = 0;
setInterval(function(){
  if(counter >= 5) {
    throw new Error('Worker crash. Why no restart?');
  }
  counter++;
  console.log('Worker alive: ' + Date.now() );
},500);

Run on the cmd line

pm2 start index.js -i 4
pm2 log

Eventually all the workers crash and never restart.
Whats the point of restarting if it can only be done on a single process.

pm2 logs (merged into single file)

    Worker alive: 1522937847186
Worker alive: 1522937847231
Worker alive: 1522937847276
Worker alive: 1522937847324
Worker alive: 1522937847691
Worker alive: 1522937847736
Worker alive: 1522937847781
Worker alive: 1522937847830
Worker alive: 1522937848193
Worker alive: 1522937848238
Worker alive: 1522937848283
Worker alive: 1522937848332
Worker alive: 1522937848693
Worker alive: 1522937848738
Worker alive: 1522937848783
Worker alive: 1522937848832
Worker alive: 1522937849194
Worker alive: 1522937849238
Worker alive: 1522937849284
Worker alive: 1522937849333
Error: Worker crash. Why no restart?
    at Timeout._onTimeout (/home/usrname/docs/Projects_NodeJS/project/app/index.js:49:11)
    at ontimeout (timers.js:466:11)
    at tryOnTimeout (timers.js:304:5)
    at Timer.listOnTimeout (timers.js:267:5)
Error: Worker crash. Why no restart?
    at Timeout._onTimeout (/home/usrname/docs/Projects_NodeJS/project/app/index.js:49:11)
    at ontimeout (timers.js:466:11)
    at tryOnTimeout (timers.js:304:5)
    at Timer.listOnTimeout (timers.js:267:5)
Error: Worker crash. Why no restart?
    at Timeout._onTimeout (/home/usrname/docs/Projects_NodeJS/project/app/index.js:49:11)
    at ontimeout (timers.js:466:11)
    at tryOnTimeout (timers.js:304:5)
    at Timer.listOnTimeout (timers.js:267:5)
Error: Worker crash. Why no restart?
    at Timeout._onTimeout (/home/usrname/docs/Projects_NodeJS/project/app/index.js:49:11)
    at ontimeout (timers.js:466:11)
    at tryOnTimeout (timers.js:304:5)
    at Timer.listOnTimeout (timers.js:267:5)
Thiago P
  • 275
  • 5
  • 14
  • Can you show pm2 logs. I just tried this and pm2 is restarting for me. – AbhinavD Apr 05 '18 at 00:01
  • @AbhinavD I've added the log file, though it does not provide any useful information as far as I can see. All the workers crash and then everything just stops. pm2 still shows the workers as online. – Thiago P Apr 05 '18 at 14:26
  • 1
    This is weird. The only reason that I can think of is when PM2 thinks that the node process is still running and has not yet exited yet. Can you check your code that you are not catching the exception? Another thing you can try `process.exit(1);` on the exception to make sure the process dies. Now we can check if the process restarts. If it does, PM2 is doing right thing. also, What is the output of `PM2 status` – AbhinavD Apr 05 '18 at 16:20
  • @AbhinavD I figured it out, sort of. I was using Node version 9 but when I downgraded to Node version 8 it worked. I don't know why, but it works! Also, using `process.exit(1 or even 0)` did not fix it and I was not catching the exception. If I ran the program without PM2 it would crash and exit as usual. – Thiago P Apr 05 '18 at 18:17

2 Answers2

1

You can try the code below. Let me know if it helps.

const cluster = require('cluster');
const numOfCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  console.log(`Master ${process.pid} is running`);

  // Fork workers.
  for (let i = 0; i < numOfCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker, code, signal) => {
    console.log('worker %d died (%s). restarting...',
        worker.process.pid, signal || code);
    cluster.fork();
  });
}
Ishan Koul
  • 189
  • 1
  • 7
0

Downgrading to Node version 8 LTS seems to have fixed the problem.

I had Node version 9 installed and the problem occurred on both Windows and Ubuntu but when I downgraded to version 8, it all worked.

Thiago P
  • 275
  • 5
  • 14