Keeping two servers in sync - secondary server only to be booted for the backup, or in outage

Question

I've just started a new Software Development job but I've basically been given loads of sysadmin stuff to do because the development team consists of only 2 people + the CTO. I've got pretty much zero experience with kind of stuff and focused mainly on desktop programming at University. So hopefully you guys can help me out with what I'm trying to do here...

So, we'll have two Amazon EC2 instances which will start out exactly the same (but set to different locations as Amazon has been pretty volatile lately) with one acting as the master and one as a backup. They both run Windows Server 2008 but have been configured with XAMPP to serve up a mySQL/PHP web service. The idea is, that in an outage or the like, we can change our DNS settings to point to the backup instance simply to minimise downtime. As there's a transactional database, and file uploads, I need to keep everything in sync. The problem is that the backup instance won't be running all of the time. I'm told that we need to keep it down at all times (except for when we "sync" the servers) in order to save costs.

From what I can tell a master-to-master replication between the databases would work fine as it would essentially "sync up" when the instance comes online. Am I right in thinking that would work? And how would be the best way to sync up the directories? Another issue I've been thinking about is I'm simply "backing up" the main server to the backup server, when the backup server is actually being used (in an outage) how can the process be automatically reversed?

Like I said, I'm a total noob when it comes to this kind of stuff but I'm fairly excited to learn about this stuff. My superiors at my work don't seem to have a clue so I'm pretty keen on trying to be the "go to guy". Any help or links or book recommendations would be greatly appreciated

Well...I don't know how effective a failover it would be if you keep the failover "off". To have redundancy you'd normally look at keeping both servers running at the same time and configure your database to run with a mirroring or failover setup, so the databases are consistent and always up to date. — Bart Silverstrim, Aug 08 '11 at 20:48
Otherwise you'd just have one server and run routine backups to a tape or drive backup on another machine, and restore it to a new system or new instance. — Bart Silverstrim, Aug 08 '11 at 20:49
If you're minimizing costs by having your failover off most of the time but you want to have a high-availability system in place, you really can't do it that way. You need to examine what the business really needs and how much your data is worth, and what your downtime would cost you, and come up with a plan to get your business back up and running within that window of time you can afford. — Bart Silverstrim, Aug 08 '11 at 20:51
We're a really small company, who do need to keep costs down, but we have a number of fairly high profile clients who use the service that we need to keep happy. I've only been working here for a week (my first job out of uni) so I don't really have much of a say in what sort of hosting packages/setups we use. As far as I know, some downtime is fine, it's just an easier method of getting something in place again after a failure is required, currently nothing is in place. — sxthomson, Aug 08 '11 at 20:55
Again, you might want to reevaluate your (business) situation. Setting up a database cluster is not something done with one off and one on to minimize costs. You'd be better off coming up with a plan for one stable, solid server with a very good backup plan in place if you can't afford to set up a good cluster and don't need the failover/high availability. Since you're looking at a "cloud" solution I won't bother mentioning RAID in there. — Bart Silverstrim, Aug 08 '11 at 21:31

score 0 · Answer 1 · answered Aug 09 '11 at 12:19

The idea is, that in an outage or the like, we can change our DNS settings to point to the backup instance simply to minimise downtime

FFS, I wouldn't call a minimum outage time of 3 hours minimizing downtime.

Failover to standby is never a good way to implement fault tolerance. Use load balancing to run all the nodes and direct traffic away from failing nodes.

Don't try to reconfigure the architecture on the fly using DNS - it takes WAY too long.

A simple solution is to implement round-robin DNS the webservers and use master-master replication on the DBMS with a configurable db connection on each node so you switch away from a failed DBMS. You could use master-slave replication is realtime sync is a must - and automate the promotion of the slave.

score 0 · Answer 2 · answered Aug 09 '11 at 13:11

Quick and simple approach would be to backup your database every hour/, then rsync (for Windows try Deltacopy rsync) your backup files and other directories to the backup server. Then on the backup server write a script to restore the database backup file that runs every hour.

rsync will transfer only the file deltas and will compress as it transmits.

Keeping two servers in sync - secondary server only to be booted for the backup, or in outage

2 Answers2

Linked