Unicorn exit timeout on Heroku after trapping TERM and sending QUIT

Question

I am receiving R12 Exit Timeout errors for a Heroku app running unicorn and sidekiq. These errors occur 1-2 times a day and whenever I deploy. I understand that I need to convert the shutdown signals from Heroku for unicorn to respond correctly, but thought that I had done so in the below unicorn config:

worker_processes 3
timeout 30
preload_app true

before_fork do |server, worker|
  Signal.trap 'TERM' do
    puts "Unicorn master intercepting TERM and sending myself QUIT instead. My PID is #{Process.pid}"
    Process.kill 'QUIT', Process.pid
  end

  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
    Rails.logger.info('Disconnected from ActiveRecord')
  end
end

after_fork do |server, worker|
  Signal.trap 'TERM' do
    puts "Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is #{Process.pid}"
  end

  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.establish_connection
    Rails.logger.info('Connected to ActiveRecord')
  end

  Sidekiq.configure_client do |config|
    config.redis = { :size => 1 }
  end
end

My logs surrounding the error look like this:

Stopping all processes with SIGTERM
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 7
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 11
Unicorn worker intercepting TERM and doing nothing. Wait for master to sent QUIT. My PID is 15
Unicorn master intercepting TERM and sending myself QUIT instead. My PID is 2
Started GET "/manage"
reaped #<Process::Status: pid 11 exit 0> worker=1
reaped #<Process::Status: pid 7 exit 0> worker=0
reaped #<Process::Status: pid 15 exit 0> worker=2
master complete
Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
Stopping remaining processes with SIGKILL
Process exited with status 137

It appears that all of the child processes were successfully reaped before the timeout. Is it possible master is still alive? Also, should the router still be sending web requests to the dyno during shut down, as shown in the logs?

FWIW, I'm using Heroku's zero downtime deployment plugin (https://devcenter.heroku.com/articles/labs-preboot/).

If it helps, I am also experiencing this problem _without_ the zero downtime deployment plugin. I hope someone can help or you can post an answer if you figure it out. Perhaps contact Heroku support? — Chris Peters, Jul 10 '13 at 11:16
Just like Chris, I'm not using zero downtime, and am experiencing this issue. This is in spite of using Heroku's recommended unicorn config. — imderek, Jul 12 '13 at 13:13
I'm having the same problem, despite using Heroku's recommended config. No zero-downtime deploy, either. — elsurudo, Aug 07 '13 at 11:12
One thing I've noticed is that this USUALLY happens on worker dynos. Not always, but usually. — Chris Peters, Sep 09 '13 at 11:52
This happens for me on Heroku too. Unicorn 4.2.1 and not using the zero downtime plugin. — joseph.hainline, Oct 22 '13 at 02:49
This post here suggested potential success switching to Puma: http://stackoverflow.com/questions/17763683/error-r12-exit-timeout-using-herokus-recommended-unicorn-config — sdanzig, Oct 31 '13 at 14:03
We have this as well. But the process terminates, so we have not put any effort to this. But you might want to contact the support of heroku directly. They are quite fast and can help you really often. — Sebastian, Nov 29 '13 at 22:21
But some note: Heroku has a timeout of 30 seconds. They recommend that you should set your timeout to something lower then 30 seconds. They even suggest to use 15 seconds. See: https://devcenter.heroku.com/articles/rails-unicorn — Sebastian, Nov 29 '13 at 22:23
I don't know what heroku uses (lxc is mentioned somewhere in their doc), but I had a similar issue with docker on certain versions of the lxc driver. Some signals would just get lost. This might be totally unrelated, but I put it here for reference: https://github.com/docker/docker/issues/2436 (comments seems to point to lxc driver) — Nicolas Goy, Aug 14 '14 at 01:10

Winfield · Answer 1 · 2014-08-12T03:00:23.880

4

I think your custom signal handling is what's causing the timeouts here.

EDIT: I'm getting downvoted for disagreeing with Heroku's documentation and I'd like to address this.

Configuring your Unicorn application to catch and swallow the TERM signal is the most likely cause of your application hanging and not shutting down correctly.

Heroku seems to argue that catching and transforming a TERM signal into a QUIT signal is the right behavior to turn a hard shutdown into a graceful shutdown.

However, doing this seems to introduce the risk of no shutdown at all in some cases - the root of this bug. Users experiencing hanging dynos running Unicorn should consider the evidence and make their own decision based on first principles, not just documentation.

edited Aug 12 '14 at 03:00

answered Dec 01 '13 at 18:50

Winfield

18,985
3
52
65

2

Heroku documentation still covers "[Graceful shutdown with SIGTERM](https://devcenter.heroku.com/articles/dynos#graceful-shutdown-with-sigterm)", and I don't see a mention of no longer needing to do this on the Cedar stack. Do you have a reference to where this can be found? – Dennis Mar 27 '14 at 18:23
I can't find any documentation that supports this answer. According to both Unicorn and Heroku's documentation, Unicorn still uses the reverse of the POSIX signal interpretation. – Josh Kovach May 15 '14 at 18:41
This is not true. Unicorn still does not shut down gracefully without explicit handling of the TERM signal. The Dev Center article supporting this can be found here: https://devcenter.heroku.com/articles/rails-unicorn#config – slant Aug 11 '14 at 21:59
I recognize the Heroku docs say you should try to catch/transform these signals. The attempts to shutdown gracefully are the most likely root cause for shutdown timeouts. – Winfield Aug 12 '14 at 01:51

Unicorn exit timeout on Heroku after trapping TERM and sending QUIT

1 Answers1