5

I've got an app that is served up by Hypnotoad, with no reverse proxy.It has 15 workers, with 2 clients allowed apiece. The app is launched via hypnotoad in foreground mode.

I am seeing the following in the log/production.log:

[Wed Apr  1 16:28:12 2015] [error] Worker 119914 has no heartbeat, restarting.
[Wed Apr  1 16:28:21 2015] [error] Worker 119910 has no heartbeat, restarting.
[Wed Apr  1 16:28:21 2015] [error] Worker 119913 has no heartbeat, restarting.
[Wed Apr  1 16:28:22 2015] [error] Worker 119917 has no heartbeat, restarting.
[Wed Apr  1 16:28:22 2015] [error] Worker 119909 has no heartbeat, restarting.
[Wed Apr  1 16:28:27 2015] [error] Worker 119907 has no heartbeat, restarting.
[Wed Apr  1 16:28:34 2015] [error] Worker 119905 has no heartbeat, restarting.
[Wed Apr  1 16:28:42 2015] [error] Worker 119904 has no heartbeat, restarting.
[Wed Apr  1 16:30:12 2015] [error] Worker 119912 has no heartbeat, restarting.
[Wed Apr  1 16:31:23 2015] [error] Worker 119918 has no heartbeat, restarting.
[Wed Apr  1 16:32:18 2015] [error] Worker 119911 has no heartbeat, restarting.
[Wed Apr  1 16:32:22 2015] [error] Worker 119916 has no heartbeat, restarting.

However, the workers are never restarted.

When I run an strace, the manager process appears to be valiantly trying to kill the (now expired) workers:

Process 119878 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
kill(119906, SIGKILL)                   = 0
kill(119917, SIGKILL)                   = 0
kill(119905, SIGKILL)                   = 0
kill(119910, SIGKILL)                   = 0
kill(119904, SIGKILL)                   = 0
kill(119914, SIGKILL)                   = 0
kill(119916, SIGKILL)                   = 0
kill(119908, SIGKILL)                   = 0
kill(119913, SIGKILL)                   = 0
kill(119915, SIGKILL)                   = 0
kill(119918, SIGKILL)                   = 0
kill(119912, SIGKILL)                   = 0
kill(119909, SIGKILL)                   = 0
kill(119911, SIGKILL)                   = 0
kill(119907, SIGKILL)                   = 0
stat("/xxx/xxx/xxx/hypnotoad.pid", {st_mode=S_IFREG|0644, st_size=6, ...}) = 0
poll([{fd=4, events=POLLIN|POLLPRI}], 1, 1000) = 0 (Timeout)
kill(119906, SIGKILL)                   = 0
kill(119917, SIGKILL)                   = 0
kill(119905, SIGKILL)                   = 0
kill(119910, SIGKILL)                   = 0
kill(119904, SIGKILL)                   = 0
kill(119914, SIGKILL)                   = 0
kill(119916, SIGKILL)                   = 0
kill(119908, SIGKILL)                   = 0
kill(119913, SIGKILL)                   = 0
kill(119915, SIGKILL)                   = 0
kill(119918, SIGKILL)                   = 0
kill(119912, SIGKILL)                   = 0
kill(119909, SIGKILL)                   = 0
kill(119911, SIGKILL)                   = 0
kill(119907, SIGKILL)                   = 0
stat("/xxx/xxx/xxx/hypnotoad.pid", {st_mode=S_IFREG|0644, st_size=6, ...}) = 0
poll([{fd=4, events=POLLIN|POLLPRI}], 1, 1000^C <unfinished ...>
Process 119878 detached

How can I troubleshoot this further to determine:

  1. Why does Hypnotoad think it still needs to kill non-existent processes?
  2. Why isn't it starting new ones?
arafeandur
  • 196
  • 1
  • 5

1 Answers1

6

What does "Worker 31842 has no heartbeat, restarting" mean?

As long as they are accepting new connections, worker processes of all built-in preforking web servers send heartbeat messages to the manager process at regular intervals, to signal that they are still responsive. A blocking operation such as an infinite loop in your application can prevent this, and will force the affected worker to be restarted after a timeout. This timeout defaults to 20 seconds and can be extended with the attribute "heartbeat_timeout" in Mojo::Server::Prefork if your application requires it.

http://mojolicio.us/perldoc/Mojolicious/Guides/FAQ#What-does-Worker-31842-has-no-heartbeat-restarting-mean

Community
  • 1
  • 1
Joel Berger
  • 20,180
  • 5
  • 49
  • 104
  • Thanks for the insight. I really enjoyed your Mojolicious non-blocking articles. I will investigate blocking loops in my application. Any tips on that would be appreciated. – arafeandur Apr 14 '15 at 14:54