Failover strategy for database application

Question

I've got a writing and reading database application holding a local cache. In case of an application server fault a backup server shall start working.

The primary and backup application can only run exclusively because of its local cache and some low isolation level on the database.

As far as my communication knowledge goes it is impossible to let both servers always figure out who is allowed to run exclusively.

Can I somehow solve this communication conflict through using the database as a third entity? I think this is a quite typical problem and there might not be a 100% safe method, but I would be happy to know how other people recommend to solve such issues? Or if there is some best practice to this.

It's okay if both application are not working for 30 minutes or so, but there is not enough time to get people out of bed and let them figure out what the problem is.

It's not actually as common a problem as you might think, as app servers are designed to be clustered and scale horizontally, wherein the failure of a single node is transparent and inconsequential. Database servers, on the other hand, are often failovered in the way you describe, for obvious reasons. — Kirk Woll, Apr 08 '12 at 17:22
Thanks, application server might have been the "wrong" term. It is a separate sever running an application which is tighly connected to the database, kind of a long arm of the database, and not designed to scale at all. — Franz Kafka, Apr 08 '12 at 17:29

score 1 · Accepted Answer · answered Apr 08 '12 at 21:13

1

Can you set up a third server which is monitoring both application servers for health? This server could then decide appropriately in case one of the servers appears to be gone: Instruct the hot standby to start processing.

answered Apr 08 '12 at 21:13

usr

168,620
35
240
369

score 0 · Answer 2 · answered Apr 08 '12 at 21:19

if i get the picture right, your backup server constantly polls the primary server for data updates, it wouldn't be hard to check if the poll fails, schedule it again for 30s later 3 times and in the third failure dynamically update the DNS entry to the database server to reflect the change in active server. Both Windows DNS and Bind accept dynamic updates signed and unsigned.

Failover strategy for database application

2 Answers2