Questions tagged [disaster-recovery]

Disaster recovery and preparedness is an unfortunate aspect of systems administration. This tag should be used for help with planning, implementation and best-practices related to recovering from a catastrophic event on a server or in a datacenter environment.

Recovering from an unplanned, catastrophic outage is a painful process whether you are managing a single server or an entire datacenter. Roof leaks, broken water lines, power outages and any number of other events can take what was a great day and turn it into a living nightmare when you are responsible for keeping systems others rely on available.

The key to recovering from any disaster is preparedness. Knowing the steps required to bring the network and systems back online is critical. Before one can properly prepare for a disaster it is necessary to understand the risks, bottlenecks and other critical components of the overall system, e.g. who controls the power, internet, etc at your site. Understanding the aspects of disaster recovery that are within ones control is a very important aspect when planning; if there is not someone on staff who can fix the power, HVAC, etc make sure that the contact info for someone who can is written down somewhere. Having a large amount of information available before a disaster occurs will help to keep everyone calm, cool and on-task when something actually does happen.

Once a risks are assessed and a plan is created, print out physical copies, email it, and make sure everyone with admin level access to the systems/datacenter has read and is familiar with them. The best plan in the world is worthless if it is on a system that is down and cannot be easily restored without following the plan. After everyone is familiar with the plan, practice when possible; in many situations it may not be realistic, but if possible take advantage of planned downtimes or natural outages to go through the recovery plan and refine it.

In summary, when a disaster happens:

  1. Don't Panic! Panic turns a debacle into a catastrophe every time.
  2. Plan ahead, understand the risks, and know what is within your control
  3. Follow the plan but be flexible, a recovery plan is more of a jazz tune than a military march
  4. Stay calm and organized, use check lists, keep notes
  5. If you are working in a team or group communicate and collaborate
  6. Be vigilant, update your plan as the environment changes
  7. Check your backups, make sure they happen at regular intervals and that the data contained therein is still good.
358 questions
5
votes
4 answers

Hard drive data rescue services

Inspired by the question Crashed Hard Drive Data Retrieval which is currently concentrating on software tools and self-help methods of data rescue, I am curious what the community's experiences are with data rescue services. These are companies that…
RBerteig
  • 651
  • 1
  • 5
  • 13
5
votes
4 answers

Virtualization for hardware resiliency?

Can anyone tell me if it is possible to pool several physical servers to run a resilient virtualization environment. Our servers are getting more and more critical to our clients and we want to do everything we can to improve resiliency in the event…
Kev
  • 249
  • 1
  • 10
5
votes
2 answers

Best practice for IIS 6.0 (Windows Server 2003) backups?

What is the best backup strategy for saving IIS 6.0 data: web metadata, files, logs etc. for disaster recovery?
splattne
  • 28,508
  • 20
  • 98
  • 148
5
votes
4 answers

Disaster Recovery/Sabotage Protection for a small business

I've been contacted by two partners in a small professional firm. They are concerned about their other partner and want to take some steps to be absolutely sure that the company's data and systems are safe from "any eventuality." They have one…
Ward - Trying Codidact
  • 12,899
  • 28
  • 46
  • 59
5
votes
5 answers

Edit Hard Disk Serial Number with VMware

I'm virtualizing a Rockwell AssetCentre Server and I'm looking at Disaster Recovery scenarios. This server contains a lot of other Rockwell Software like RSLinx, Logix 5000, Logix 500, and more... Software activations for Rockwell work in a very…
Lucretius
  • 459
  • 1
  • 4
  • 14
5
votes
6 answers

Can A Virtual Machine Be Converted to a Virtual Server e.g VMWare?

The Simple Question Can I convert an existing VM to a Virtual Server (e.g. VMWare)? I'm using Oracle's one and only awesome product, VirtualBox, and I'm trying to setup a SharePoint Farm to migrate our existing, non-virtual SharePoint Farm to. The…
5
votes
1 answer

Mysql disaster recovery

We just got a major disaster: somebody made an uncontrolled update on the production database, and obviously, the backup process is not working since a long time, so we got a major data loss. A 40 millions rows table is now full of garbage. Does…
Alexis Dufrenoy
  • 235
  • 1
  • 3
  • 11
5
votes
5 answers

LVM vs RAID0 vs RAID "linear" - Combine 2 disks as one, data recovery?

given two 2TB USB external disks that have to be combined to one 4TB volume and formatted with one big Filesystem (XFS), I have a small question to ask. Does LVM provide better Data recovery, should one disk be unplugged/damaged by being able to…
leto
  • 261
  • 2
  • 5
  • 11
5
votes
3 answers

How can you recover a SQL Server database if the ldf file has been deleted

We had a drive die and lost the ldf file, but the mdf file is in tact. Is there a process for re-connecting to the mdf file, considering the ldf lost? I have searched without much luck.
Tom Lianza
  • 331
  • 1
  • 3
  • 11
5
votes
1 answer

How to get cheap disaster recovery for a 124 TB Isilon filesystem?

On our Isilon cluster, we have a 124 TB file system. It is currently 38 percent full, with 31 million files. About half the data are image files, and the mean file size is 1.5 MB. We use snapshots to protect against accidental deletion, but we…
Vebjorn Ljosa
  • 662
  • 1
  • 5
  • 13
4
votes
10 answers

How do you recover when your hosting provider loses everything?

You've probably seen the messages at the stackoverflow blog and on codinghorror: blog.stackoverflow.com experienced 100% data loss at our hosting provider, CrystalTech. We're working to restore it from backups ASAP! Some of the stuff Jeff's doing…
Ward - Trying Codidact
  • 12,899
  • 28
  • 46
  • 59
4
votes
1 answer

Cannot login to restored domain controller unless NIC removed

We have four DCs (Windows 2008 R2) running on VMware, backed up with Veeam. Currently doing some DR testing. If I restore a DC (full VM) from backup (obviously isolated from our production environment) I cannot log in. I receive the error…
Matt
  • 1,893
  • 5
  • 28
  • 40
4
votes
2 answers

Is it compulsory to pick only a region pair as Azure DR region?

Certain Azure VM types (like some in G & M series that I require) are not available in the Azure region pair (US East, US West). Will there be any constraints on functionality if I choose an Azure region other than the Azure region pair (say, US…
mvp
  • 91
  • 5
4
votes
2 answers

Build mirroring on top of SQL2008R2 SQL cluster

On our SharePoint 2010 farm, we are using SQL 2008 R2. Currently, a SQL cluster (with 2 SQL server sitting at same data center) is built to provide auto-failover. However, it has no DR ability. We are requested to provide DR ability to the system.…
Mark
  • 217
  • 3
  • 11
4
votes
2 answers

Can't get my RAID array out of degraded mode

I've got a 4-drive RAID 10 array that has just had a drive failure. I ignorantly have never practiced how to recover from a failure (I'm a programmer, just keeping this server as a hobbyist) so I'm having to learn this all the hard way right now. I…
KOGI
  • 143
  • 7