Questions tagged [disaster-recovery]

Disaster recovery and preparedness is an unfortunate aspect of systems administration. This tag should be used for help with planning, implementation and best-practices related to recovering from a catastrophic event on a server or in a datacenter environment.

Recovering from an unplanned, catastrophic outage is a painful process whether you are managing a single server or an entire datacenter. Roof leaks, broken water lines, power outages and any number of other events can take what was a great day and turn it into a living nightmare when you are responsible for keeping systems others rely on available.

The key to recovering from any disaster is preparedness. Knowing the steps required to bring the network and systems back online is critical. Before one can properly prepare for a disaster it is necessary to understand the risks, bottlenecks and other critical components of the overall system, e.g. who controls the power, internet, etc at your site. Understanding the aspects of disaster recovery that are within ones control is a very important aspect when planning; if there is not someone on staff who can fix the power, HVAC, etc make sure that the contact info for someone who can is written down somewhere. Having a large amount of information available before a disaster occurs will help to keep everyone calm, cool and on-task when something actually does happen.

Once a risks are assessed and a plan is created, print out physical copies, email it, and make sure everyone with admin level access to the systems/datacenter has read and is familiar with them. The best plan in the world is worthless if it is on a system that is down and cannot be easily restored without following the plan. After everyone is familiar with the plan, practice when possible; in many situations it may not be realistic, but if possible take advantage of planned downtimes or natural outages to go through the recovery plan and refine it.

In summary, when a disaster happens:

  1. Don't Panic! Panic turns a debacle into a catastrophe every time.
  2. Plan ahead, understand the risks, and know what is within your control
  3. Follow the plan but be flexible, a recovery plan is more of a jazz tune than a military march
  4. Stay calm and organized, use check lists, keep notes
  5. If you are working in a team or group communicate and collaborate
  6. Be vigilant, update your plan as the environment changes
  7. Check your backups, make sure they happen at regular intervals and that the data contained therein is still good.
358 questions
4
votes
5 answers

How do you properly do Disaster Recovery for a file server?

We are currently working on implementing a DR strategy for a windows file server. We have ruled out Storage Replication because it is a preview feature, and Failover Clustering is designed for high-availability, not DR. DFSR also has deficiencies in…
Bigbio2002
  • 2,823
  • 12
  • 35
  • 54
4
votes
4 answers

Best way to replicate / mirror 100s of databases in SQL 2005

I currently host around 400-500 SQL 2005 databases of varying sizes (1-10 gig) each. I am aware of most of the different methods available and the general pros/cons of mirroring, log shipping, replication and clustering, but i am not aware of how…
mrwayne
4
votes
3 answers

Out of band notification when UPS loses mains power?

We're currently looking into upgrading our UPS and possibly having a small petrol generator on site in the eventuality that the UPS battery is used long enough to drain it completely. Realistically, we can only afford a UPS that would give us…
dannymcc
  • 2,717
  • 10
  • 48
  • 72
4
votes
3 answers

Recovering with DDRescue Cannot Complete (write error: Read-only file system)

I'm trying to recover a corrupt VDI using vdfuse to mount the VDI and using dd_rescue to rescue the borked partition. dd_rescue seems to be working fine but once it reached about half of the partition, it just STOPs and gives the following…
4
votes
1 answer

Blackberry Outage: How can I PIN a large number of users?

We lost power due to the hurricane and need to notify BES users that services will be restored soon. How can I extract all BES users and send them a notification message? Also how do I alter the from address that is sent?
makerofthings7
  • 8,911
  • 34
  • 121
  • 197
4
votes
3 answers

lvm+ext3 vs ext3 recovery

I would like to hear from people who have personal experience recovering crashed systems or data from crashed drives using lvm+ext3 and ext3. Which option is harder to recover from backup or to extract missing data?
Kazimieras Aliulis
  • 2,324
  • 2
  • 26
  • 46
4
votes
1 answer

MSMQ Disaster Recovery - How to recover message queues from a crashed machine?

How can message queues be recovered from a crashed machine, so that transactional messages can be restored on a new machine?
Thomas Bratt
  • 355
  • 2
  • 6
  • 16
4
votes
2 answers

Save data after accidental dd format

as embarrassing as it sounds I managed to dd a debian iso to an external hd instead of my usb pen drive. now my 1.5 tb western digital has 1 700mb partition named debian and the rest is unallocated space. if I understand correctly how dd works the…
ndp
  • 205
  • 1
  • 8
4
votes
6 answers

Need help with estimating required bandwidth for SAN array to SAN array replication over WAN

I have a long-term goal of setting up a DR site in a colo somewhere and part of that plan includes replicating some volumes of my EqualLogic SAN. I'm having a bit of a difficult time doing this because I don't know if my method is sound. This post…
4
votes
1 answer

MSMQ Disaster Recovery

I'm looking into utilizing MSMQ in our enterprise applications. The one area I haven't been able to find information on is with disaster-recovery. Scenario A fire has broken out in the server room. All the equipment has been destroyed and we need to…
Paul Turner
  • 251
  • 4
  • 17
4
votes
2 answers

When is it appropriate to run chkdsk?

A UPS failed and a Hyper-V box hosting 6 VMs went down (all Windows Server 2008 R2 x64, if it matters). When I booted back up, it did not automatically perform a chkdsk (which I have seen in the past). I didn't notice whether any of the VMs did…
Christopher
  • 1,381
  • 1
  • 12
  • 22
4
votes
1 answer

Recover PostgreSQL database from filesystem backup

I have a PostgreSQL 8.3 data directory backup. I need to copy a database from this backup into a new PostgreSQL instance. Due to problems with the old server I cannot do a pg_dump of the database. I have figured out which directory is for the…
Sean Preston
  • 388
  • 1
  • 3
  • 9
4
votes
2 answers

Business Continuity and Disaster Recovery Plans

My company is looking for outside assistance in (re)putting together our BC and DR plans. How do you judge companies from one another in this arena?
mdpc
  • 11,856
  • 28
  • 53
  • 67
4
votes
4 answers

Last time SQL Server downtime or data-loss occured, what happened?

This isn't a question about how to cope with or limit downtime or data-loss, I know all about that. I'm putting together a 'stories' section for my PASS post-con on disaster recovery and I'd like to be able to share some more recent and impressive…
Paul Randal
  • 7,194
  • 1
  • 36
  • 45
4
votes
3 answers

Is there any way to recover a 2003 Exchange database without the domain controller?

I have access to the Exchange Server but the domain controller RAID card died and the restored backup causes the new server to reboot without even a blue screen. I have rebuilt a new domain controller and another Exchange server. What is the best…