Questions tagged [disaster-recovery]

Disaster recovery and preparedness is an unfortunate aspect of systems administration. This tag should be used for help with planning, implementation and best-practices related to recovering from a catastrophic event on a server or in a datacenter environment.

Recovering from an unplanned, catastrophic outage is a painful process whether you are managing a single server or an entire datacenter. Roof leaks, broken water lines, power outages and any number of other events can take what was a great day and turn it into a living nightmare when you are responsible for keeping systems others rely on available.

The key to recovering from any disaster is preparedness. Knowing the steps required to bring the network and systems back online is critical. Before one can properly prepare for a disaster it is necessary to understand the risks, bottlenecks and other critical components of the overall system, e.g. who controls the power, internet, etc at your site. Understanding the aspects of disaster recovery that are within ones control is a very important aspect when planning; if there is not someone on staff who can fix the power, HVAC, etc make sure that the contact info for someone who can is written down somewhere. Having a large amount of information available before a disaster occurs will help to keep everyone calm, cool and on-task when something actually does happen.

Once a risks are assessed and a plan is created, print out physical copies, email it, and make sure everyone with admin level access to the systems/datacenter has read and is familiar with them. The best plan in the world is worthless if it is on a system that is down and cannot be easily restored without following the plan. After everyone is familiar with the plan, practice when possible; in many situations it may not be realistic, but if possible take advantage of planned downtimes or natural outages to go through the recovery plan and refine it.

In summary, when a disaster happens:

  1. Don't Panic! Panic turns a debacle into a catastrophe every time.
  2. Plan ahead, understand the risks, and know what is within your control
  3. Follow the plan but be flexible, a recovery plan is more of a jazz tune than a military march
  4. Stay calm and organized, use check lists, keep notes
  5. If you are working in a team or group communicate and collaborate
  6. Be vigilant, update your plan as the environment changes
  7. Check your backups, make sure they happen at regular intervals and that the data contained therein is still good.
358 questions
2
votes
10 answers

Desktop standby solution?

Does anyone know of a good solution to have a Windows desktop standby so that users can be back up and running quickly in case of hardware failure? Assumption is that the desktops aren't generic but highly customized for individual users. Although…
2
votes
3 answers

Bare Metal Restore with Backup Exec 11d on Win 2003

So, I'm looking at a bare metal restore situation with my current setup. Not sure if this is possible, but looking at a hypothetical restore path to bounce off of with some Server Fault gurus ;). As of right now, our DC currently hosts our Exchange…
2
votes
3 answers

Where to start with Microsoft Exchange 2003

I am quite new to Exchange in general. I have done some server administration, but never with an Exchange box. Can someone point me in a direction that will give me a good overview and a set of best practices? I inherited a box with Exchange on…
Chad Harrison
  • 6,990
  • 10
  • 29
  • 41
2
votes
3 answers

How can I tell if my hard drive(s) have Battery Backed Write Cache?

How can I tell if my hard drives have a battery backed write cache (BBWC)? How can I tell if it is enabled and/or configured correctly? I don't have physical access to my server. It's a GNU/Linux box. I can provide supplemental incremental…
Riedsio
  • 283
  • 4
  • 7
2
votes
4 answers

What are the options for synchronizing files between Linux servers in real time without an intermediary or remote share?

Quickfix (an open source FIX Engine) persists state information and sent/received messages in the filesystem of the server (Linux in this case). For disaster recovery, I should like these files to be kept up to date in near-realtime on a standby…
Rym
  • 539
  • 1
  • 4
  • 10
2
votes
8 answers

Accidentally rm -rf /usr/* as root, what now?

A colleague of mine accidentally deleted /usr/* data by running: rm -rf /usr/*. And it's now a big issue. We had a lot of good data on that machine. Most of the commands are not working as a result. Is there anyway I can recover the machine? I'm not…
pavanlimo
  • 123
  • 1
  • 5
2
votes
1 answer

SharePoint Quiesce Servers

Environment: SharePoint 2007 (standard) Intranet Publishing site on two-server small farm. We need to be able to plan a shutdown operation and I thought that I would have to keep the two servers in synch when powering down. THe STSADM 'quiescefarm'…
IrishChieftain
  • 201
  • 3
  • 12
2
votes
3 answers

Need a recipe following a hack disaster

I have inherited a Linux Apache CentOS plesk server which has been hacked, which has websites which are in production. I have been advised by my friend to rebuild it from scratch, since the attack apparently is quite widespread and it is hard to…
Daniel Higgins
  • 167
  • 1
  • 1
  • 5
2
votes
2 answers

Imaging a colocated server

I need some advise on if something is possible. Here is the Scenario... I have a Co-Located Windows 2008R2 DC Edition server in chicago (far from me) so I only can access it via RDC. It has 1 TB drive as the main drive and another 1 TB drive that…
2
votes
2 answers

Do MSDTC and disaster recovery go together?

Our application writes to multiple Sql Server databases within a distributed transaction. The Ops guys are saying that this messes up their disaster recovery plan because while the transactions on the live tables may commit at the same time, the log…
DevDelivery
2
votes
3 answers

Rebuild or repair?

Possible Duplicate: Updating Malware cleaning skills I was having an argument the other day regarding damaged systems. If a system has a hard to eradicate virus, etc, or has been damaged by a software install, etc, do you advocate rebuilding the…
Robot
  • 337
  • 1
  • 3
  • 8
2
votes
4 answers

Standard database backup procedures

I've been looking at the setup we've implemented for a hosted client, who has a database backed up several times per day (and the last week of backups always available). Say the backups require 20gb, and the drive they are hosted on (partition,…
Dave
  • 281
  • 1
  • 3
  • 11
2
votes
2 answers

How can I recover overwritten labels, pointer blocks and ueberblocks in a ZFS pool?

I caused a stupid accident in a single-disk ZFS pool, seemingly in the same way as the person in this mailing list thread, i. e., I seem to have overwritten important metadata. Can this be restored from the actual payload, or is there a way to…
Hanno Fietz
  • 1,022
  • 2
  • 13
  • 24
2
votes
1 answer

Exchange Server 2010: move mailboxes from recoveded and mounted edb to user’s mailbox

One of our exchange servers crashed, and I am trying to recover the mailboxes. We had 1 exchange 2003 server named "apex" and 1 exchange 2010 server named "2008Enterprise. the exchange 2010 server named "2008Enterprise" crashed. I created a new…
user36090
  • 21
  • 2
2
votes
2 answers

How do I get an SQL Database in "Mirrored, Disconnected / In Recovery" into service?

I have an SQL Server 2005 mirrored database, with just the primary and secondary servers, no witness. Tonight the primary has gone down and will not be back on-line for some time yet. The secondary server is still running, but is "Mirrored,…
Anthony K
  • 374
  • 1
  • 6
  • 12