Well my $379 dedicated sever just went cold again for the second time this month. I need to find a way to get another server going at a different provider that I can just flip on in case this happens again. What is the best way to deal with this type of situation?
5 Answers
Replacing the broken server.
Seriously - adding a second server works. but it is not that easy espeically when databases get involved. YOu need after all to real time replicate stuff.
If a server crashess twice per month, there are two things possible:
- Admin is incompetent (totally borked drivers etc.) or
- Broken Hardware (* well, or hardware not supported by the os).
In all cases I would tackle it on that end. Basically - server broken, please replace on warranty. Twice per month is too much.

- 51,649
- 7
- 54
- 136
finding a solid provider with a decent uptime SLA would be the start, I guess.
in terms of your question - how to get some form of redundancy going should you have a second server at a separate location - requires some knowledge of what you are doing on the server in the first place (hosting web sites? email? some other service?)
A standard LAMP setup can be made relatively redundant using rsync to synchronise files to a standby machine, and DNS (essentially setting up the relavant A records with low TTL values) to allow switching between the two active sites.
downsides: it's a slow and clumsy way of doing it, requires manual intervention and will require DNS that isn't handled by either of the two boxes (not a problem if you're using an outside DNS provider).
The biggest issue in this type of solution is the database: basic replication of a database is fairly easy (well, relatively) but being able to seamlessly switch back once the outage is over is not. Also, running a system dependent on a remote commit could slow everything down significantly. The scenario is also complicated by having two machines at two providers - a traditional load balancer would be difficult to implement as the networks are physically separate and the work required for something like haproxy or a general shared storage solution is on the other side of a diminishing returns curve.
you will spend more time trying to figure out how to deal with the switching (and then the monitoring and management thereof) than actually running a decent service.
So I guess the answer is as mentioned already: to build something that allows you to just flip over from one machine to the other depends on what you are doing, but is almost guaranteed to be more costly and complicated than simply getting a solid, SLA backed hosting arrangement with a well organised provider. do that first, then worry about load balancing and redundancy.

- 1,411
- 12
- 14
I think your plan is sound. It's clear something is awry with your current provider and your confidence in their ability to provide reliable service is waning. I'd wander over to the dedicated server forum on Web Hosting Talk and see what people are saying about the providers in your price range.

- 109,363
- 18
- 175
- 245
I would suggest you get a backup dedicated server, obviously it doesn't need to be nearly as beefy as your present server since it will only be used for backup purposes. I would suggest these guys www.smarterdedicatedserver.com since they are really cheap and offer good uptime protection at their datacenter. In terms of setting up the failover or even load balancing you can use this company http://www.autofailover.com/ or something similar. You'll also need to keep the two servers synced, I would either write a script to do this nightly and use something like database mirroring (SQL Server) to keep the databases synced up in real time or some other replication strategy. Its definately going to be extra work but the uptime will be much better. No service can possibly be 100% uptime so if that is important I would setup a second dedicated server.

- 233
- 3
- 11
This is a very broad subject area without more details. Specifically we need to know whether the failures you've had are due to comms link or hardware.
If the former, look at getting co-located or hosted in a secondary datacentre with real-time replication.
If the latter, we need to know how quickly you need to get back online after a failure. This will determine whether we're looking at real-time replication (in the same or another datacenter) or recovery from disk/tape-based backup.
If you want specific details of anything, we need to know what application(s) you're running that need to be highly available.

- 1,066
- 7
- 13
-
More specifically, it crashes from an out-of-memory error when OOM Killer comes in and kills access to everything. – james Sep 26 '10 at 16:20