1

So, when using AWS ELB, there's a health check url we use to ping db server / check out bound connection, etc.

If there are unhealthy instances, they are removed. But if the db server is down, our web server is designed to continue function to the best of its ability. (Design for failure).

These 2 concepts seems to conflict each other? If a single server is not health, it'd be removed. If all server is affected, we want to keep them online.

How to solve this dilemma?

Sleeper Smith
  • 523
  • 1
  • 4
  • 11
  • Why are you pinging your db server for a health check? Instrument your health check so that it checks your app servers and so that it fails if your app servers aren't able to get a response from your database. – EEAA Jun 24 '14 at 02:43
  • @EEAA that's what I'm doing and what i mean – Sleeper Smith Jun 24 '14 at 03:24
  • You said that you're pinging your db server for your health check. You should not be doing that. – EEAA Jun 24 '14 at 03:35
  • @EEAA i meant from the web server. The webserver test it's connection to the dB server. If the web server can't see the dB server, it goes off line – Sleeper Smith Jun 24 '14 at 07:03
  • ELB Healthcheck is designed to provide a degree of availability in the EC2 (compute) capacity behind it. If your compute cluster has no means to provide service without accessing the data layer then you must ensure that the data layer provides an availability level at least as good as the compute layer and focus your monitoring efforts on it. – ma.tome Apr 29 '15 at 08:16

1 Answers1

0

Those two concepts, as describe, do conflict with one another. The ELB needs to get HTTP 200 responses to the health check or the instances will be dropped from the ELB.

If you want the application to stay online even after the DB has failed you'll want to change the health check URL to something that will still respond with HTTP 200s even when the database is down.

Nathan V
  • 711
  • 5
  • 16