Questions tagged [disaster-recovery]

Disaster recovery and preparedness is an unfortunate aspect of systems administration. This tag should be used for help with planning, implementation and best-practices related to recovering from a catastrophic event on a server or in a datacenter environment.

Recovering from an unplanned, catastrophic outage is a painful process whether you are managing a single server or an entire datacenter. Roof leaks, broken water lines, power outages and any number of other events can take what was a great day and turn it into a living nightmare when you are responsible for keeping systems others rely on available.

The key to recovering from any disaster is preparedness. Knowing the steps required to bring the network and systems back online is critical. Before one can properly prepare for a disaster it is necessary to understand the risks, bottlenecks and other critical components of the overall system, e.g. who controls the power, internet, etc at your site. Understanding the aspects of disaster recovery that are within ones control is a very important aspect when planning; if there is not someone on staff who can fix the power, HVAC, etc make sure that the contact info for someone who can is written down somewhere. Having a large amount of information available before a disaster occurs will help to keep everyone calm, cool and on-task when something actually does happen.

Once a risks are assessed and a plan is created, print out physical copies, email it, and make sure everyone with admin level access to the systems/datacenter has read and is familiar with them. The best plan in the world is worthless if it is on a system that is down and cannot be easily restored without following the plan. After everyone is familiar with the plan, practice when possible; in many situations it may not be realistic, but if possible take advantage of planned downtimes or natural outages to go through the recovery plan and refine it.

In summary, when a disaster happens:

  1. Don't Panic! Panic turns a debacle into a catastrophe every time.
  2. Plan ahead, understand the risks, and know what is within your control
  3. Follow the plan but be flexible, a recovery plan is more of a jazz tune than a military march
  4. Stay calm and organized, use check lists, keep notes
  5. If you are working in a team or group communicate and collaborate
  6. Be vigilant, update your plan as the environment changes
  7. Check your backups, make sure they happen at regular intervals and that the data contained therein is still good.
358 questions
4
votes
4 answers

DPM Bare Metal Restore

We currently use Microsoft's Data Protection Manager (DPM) for backups in a relatively small environment, but I'm still worried about our main active directory server. I've been thinking that it'd be great to have a bare metal restore option for…
Chris Roberts
  • 473
  • 2
  • 6
  • 12
4
votes
3 answers

Virtualisation for Disaster Recovery

Can anyone give me any ideas/links so that I can better get an idea of how virtualisation can help me from a disaster recovery point of view? We have a server sitting in a datacentre, it basically has a has a bunch of web services that sit on the…
Geoff Wray
4
votes
2 answers

Scripting an automated SQLServer 2008 DR move

We use the built in logshipping in SQLServer to logship to our DR site but once in a month do a DR test which requires us to move back and forth between our Live and BAckup servers. We run multiple (30) databases on the system so manually backing up…
ItsAMystery
  • 51
  • 1
  • 5
3
votes
3 answers

How to do a snapshot of a Datastore on esxi vmware - datastore snapshot as fast recovery backup

I have a server with hundreads of vms and the snapshot is very useful to me, on incidents i can restore a vm to a point on time in no time, but i have many datastores where operational data is stored and all bkp is done on LTO tapes, but the restore…
3
votes
3 answers

How long will it take to create a new EBS volume from a 1TB snapshot?

I am taking periodic snapshots of a 1TB EBS (Amazon Web Services Elastic Block Store) volume as backup. In the case of the whole AZ (Availability Zone) becoming unavailable, my Disaster Recovery plan is to create a new EBS volume from the latest…
3
votes
2 answers

Allowed production maintenance during business hours

I recently came into managing a small startup. As most small startups, I would think, we have been doing what we wanted in production virtually when we thought it was okay. People are careful and things have worked very well. We have also been able…
Telavian
  • 133
  • 5
3
votes
1 answer

Azure regional pairing for G series VMs

Currently G series VMs are only supported by US East 2 and US West Azure regions - both of which are not DR pairs. How can Disaster Recovery for G series VMs in the US, be handled?
mvark
  • 207
  • 1
  • 11
3
votes
1 answer

What Windows Server Roles should system state be backed up on?

Currently I have the following types of servers all with single roles per server: AD (DNS, AD, Sysvol, Com+, Certificate stores, Registry) IIS (Metabase) Exchange FileServer MSSQL FileServer TerminalServer HyperV Host What are the drawbacks of not…
3
votes
5 answers

Bare-Metal Restores of Linux Servers with Tivoli

We currently use IBM Tivoli to back up our Linux servers and we are looking for suggestions on the best way to restore to bare metal. I've read IBM's doc on this issue. Is that still relevant or is there a better way? Also, how do you handle…
Chad P
  • 1,510
  • 2
  • 14
  • 16
3
votes
3 answers

Probability of Blade Chassis Failure, Redundancy

Suppose I have a blade server HP C7000, with three blades. Q1. Is there any disaster recovery technique that provides redundancy between blades? If any one blade goes down the second should come up with the same configuration. Q2. Is it possible to…
3
votes
2 answers

SQL Server 2005 Mirrored DB Recovery

Scenario: We want to use SQL Server 2005 Standard's version of DB mirroring along with a witness server in an Active Directory domain environment. The database is fed from a 3rd party app server that cannot be modified apart from the DB connection…
Matt Rogish
  • 1,512
  • 6
  • 25
  • 41
3
votes
1 answer

Recover mdadm RAID 10 array: listing all as spares

I'm a bad person and haven't backed up my RAID elsewhere. I now have a RAID10 array that won't assemble, and I'm hopeful I can save it. Here's the details: I have five hard drives set up as RAID10 (4+1 Spare). For unknown reasons, two failed and…
Eddie Parker
  • 571
  • 1
  • 4
  • 10
3
votes
4 answers

How can I create a "fail whale" for my website?

I have a website that we load-balance across a few machines. The load-balancer (a Brocade ServerIron ADX) is on the local network. I know it has the capability to configure a "backup" ip address to use as the "real", but it would need to be on a…
3
votes
1 answer

Exchange 2010 Database dirty shutdown

We experienced a power failure, after i bring the server up the Mailbox database is on a dirty shutdown state. I've ran eseutil /r E00 and got PS E:\BACKUPS_MDB_EXCHANGE\Mailbox Database 1773415643> eseutil /r E00 Extensible Storage Engine…
GriffinHeart
  • 411
  • 6
  • 14
3
votes
3 answers

switching dns server providers

I'm trying to wrap my head around something that I thought I kinda understood, but clearly there's some piece missing. We're currently using Zerigo as our primary dns, with slave dns running on linode. This works quite well. However, recent DDOS…
Yoav Aner
  • 561
  • 2
  • 6
  • 13