0

Having experienced a few periods of downtime, we've recently upgraded to a production environment in Heroku (Crane database plus 2 x web dynos) however we've seen no improvement. In fact reliability seems to have decreased since upgrading.

The root cause seems to be the following exception: PG::Error (SSL SYSCALL error: EOF detected which causes the dyno to fail and - eventually - restart, but not before causing some downtime.

I've no idea what's causing it. Common culprits appear to be Resque and Unicorn, neither of which I'm using. We're on rails 3.2.11, on Heroku Cedar, using pg gem 1.14.1

Logs report the following at crash time:

    2013-05-23T19:01:33+00:00 app[heroku-postgres]: source=HEROKU_POSTGRESQL_PINK measure.current_transaction=34490 measure.db_size=38311032bytes measure.tables=19 measure.active-connections=7 measure.waiting-connections=0 measure.index-cache-hit-rate=0.99438 measure.table-cache-hit-rate=0.8824     
2013-05-23T19:01:35.123633+00:00 app[web.2]: 
2013-05-23T19:01:35.123633+00:00 app[web.2]: PG::Error (SSL SYSCALL error: EOF detected
2013-05-23T19:01:35.123633+00:00 app[web.2]: ):

I have read the following: https://groups.google.com/forum/?fromgroups#!topic/heroku/a6iviwAFgdY but can't find anything that might help.

kierantop
  • 166
  • 1
  • 5
  • Sounds like possible intermittent networking issues between appserver and DB, though it could also be appserver nodes terminating abruptly (say with a process crash) instead of cleanly disconnecting. You need to look at your PostgreSQL logs as well as the appserver logs. – Craig Ringer May 24 '13 at 00:25
  • I suspect exactly that, too. Heroku postgres logs don't really show much interesting info, however looking at the ruby stacktrace: `connection_adapters/postgresql_adapter.rb:294:in 'exec'` `connection_adapters/postgresql_adapter.rb:294:in 'dealloc'` I'm going to try the following monkey patch: https://github.com/rails/rails/issues/3392#issuecomment-4516400 – kierantop May 24 '13 at 10:16
  • This monkey-patch appears to stop recurrences of the above exception. However it hasn't stopped the underlying issue - Heroku web dynos crashing/timing, then hanging, much more frequently than I would hope! – kierantop May 27 '13 at 14:47
  • That's something you'll need to take up with Heroku "support". – Craig Ringer May 28 '13 at 00:17

1 Answers1

1

https://gist.github.com/ktopping/5657474

The above fixes the exception, which is useful (as it should declutter my logs, and even help speed up reconnecting to the database) but doesn't actually stop my main issue which is Heroku web dynos crashing more often than I would like.

Am investigating some other routes (Unicorn, rack-timeout).

kierantop
  • 166
  • 1
  • 5
  • Having switched to Unicorn and added rack-timeout (in order to - hopefully - get more detailed log messages if/when a dyno times out) I can report 3 days of uptime (where before 4 hours was doing well!). Reluctant to say it's "fixed", but it certainly looks better. Migrating to Unicorn seems a no-brainer, especially as Heroku recommend it themselves. https://devcenter.heroku.com/articles/rails-unicorn – kierantop May 29 '13 at 13:14