Questions tagged [disaster-recovery]

Disaster recovery and preparedness is an unfortunate aspect of systems administration. This tag should be used for help with planning, implementation and best-practices related to recovering from a catastrophic event on a server or in a datacenter environment.

Recovering from an unplanned, catastrophic outage is a painful process whether you are managing a single server or an entire datacenter. Roof leaks, broken water lines, power outages and any number of other events can take what was a great day and turn it into a living nightmare when you are responsible for keeping systems others rely on available.

The key to recovering from any disaster is preparedness. Knowing the steps required to bring the network and systems back online is critical. Before one can properly prepare for a disaster it is necessary to understand the risks, bottlenecks and other critical components of the overall system, e.g. who controls the power, internet, etc at your site. Understanding the aspects of disaster recovery that are within ones control is a very important aspect when planning; if there is not someone on staff who can fix the power, HVAC, etc make sure that the contact info for someone who can is written down somewhere. Having a large amount of information available before a disaster occurs will help to keep everyone calm, cool and on-task when something actually does happen.

Once a risks are assessed and a plan is created, print out physical copies, email it, and make sure everyone with admin level access to the systems/datacenter has read and is familiar with them. The best plan in the world is worthless if it is on a system that is down and cannot be easily restored without following the plan. After everyone is familiar with the plan, practice when possible; in many situations it may not be realistic, but if possible take advantage of planned downtimes or natural outages to go through the recovery plan and refine it.

In summary, when a disaster happens:

  1. Don't Panic! Panic turns a debacle into a catastrophe every time.
  2. Plan ahead, understand the risks, and know what is within your control
  3. Follow the plan but be flexible, a recovery plan is more of a jazz tune than a military march
  4. Stay calm and organized, use check lists, keep notes
  5. If you are working in a team or group communicate and collaborate
  6. Be vigilant, update your plan as the environment changes
  7. Check your backups, make sure they happen at regular intervals and that the data contained therein is still good.
358 questions
7
votes
5 answers

Local to Remote Webserver Failover

Short and sweet, I don't suppose you'll need more detail than this: We host our website on an in-house webserver. A catastrophe has and will happen again where communication from the web into/out of our building ceases. When this happens, we'd…
anonymous coward
  • 615
  • 3
  • 8
  • 15
7
votes
2 answers

In 2020 - are there any viable Linux block-level replication alternatives for DRBD?

I'm researching how can we implement near-realtime replication from primary datacenter to a disaster recovery site. Data that would get replicated would be: Images of KVM VMs MySQL and PostgreSQL databases For the sake of simplicity let's assume…
pQd
  • 29,981
  • 6
  • 66
  • 109
6
votes
1 answer

Recovering lost VHDX / VM (deleted by Veeam)

Earlier this week a VM on one of our hypervisors experienced extended downtime (~24 hours) due to some Windows updates going wrong. I ultimately was able to fix the issue, and noticed yesterday that Veeam wasn't backing up the VM anymore as part of…
6
votes
5 answers

What's the difference between a Disaster Recovery Plan and a Business Continuity Plan?

I used to think both terms referred to the exact same thing, but one of my clients just requested to have a look at both documents. The request emanates from the security department of a very big company, so I guess they know what they're talking…
Brann
  • 630
  • 1
  • 9
  • 19
6
votes
3 answers

Is DFSR designed for use for Disaster Recovery?

We are currently working on implementing a DR strategy. Instead of SAN-SAN replication, it has been decided to have 2 live file servers replicating via DFSR. However, I don't know whether or not this is a good idea. Example: DFS does not replicate…
Bigbio2002
  • 2,823
  • 12
  • 35
  • 54
6
votes
2 answers

Automated bare-metal recovery practices for small network

I have several machines which are on a small network with one DC and 3 to 5 workstations on the network at any given time. These are all setup with DNS and AD on the same server. I want the ability to automate a backup, re-image, and restore of…
6
votes
2 answers

Active Directory Disaster Recovery in a Small Business

This is hypothetical question, but one I’m sure that someone must have encountered and/or given some thought to before. Situation: Consider this, a small business is running an Active Directory domain and has two domain controllers which are located…
6
votes
3 answers

Disaster Recovery Planning, tower or rack?

I'm working on this project to develop a system with centralized information regarding emergencies delivered via open Wi-Fi on a small city. I'm from Chile, so we thought of this system to work especially when an earthquake strikes the city (just…
user78442
6
votes
3 answers

Reconstructing .bashrc from running session

I accidentally deleted my .bashrc. I still have the terminal running. What settings can I recover? I already have the aliases (from the alias command). I assume that all ifs and cases are gone, but I want to retrieve the variables. How can I do…
Ada
  • 93
  • 1
  • 2
  • 4
6
votes
2 answers

WHEN to put the contingency plan into action in case of a main server failure?

We have a production SQL Server database server shipping transactional log backups to two standby servers. The disaster recovery plan is already finished: we have a well documented procedure and people trained to put the standby server into…
IT2
  • 63
  • 3
6
votes
5 answers

Fault tolerant server structure for the smallest of businesses

I'm trying to figure out what to do for a small business that has been plagued by ridiculous hardware problems. Right now, this business runs on five or six desktop machines; no server infrastructure is in place. On top of that, and I'm not…
6
votes
2 answers

Green System Administrator looking for helpful tips

I have just been promoted to Systems Administrator for our product. We are designing a application that communicates with the cloud(Amazon EC2). I will be in charge of maintaining all Instances and their underlying components. So far this involves a…
6
votes
2 answers

MySQL replication issues after a power outage

After a power outage at our data centre, the slave MySQL databases are struggling. This is in the logs for one of the slaves: 100118 10:05:56 [Note] Slave I/O thread: connected to master 'repl@db1:3306', replication started in log 'bin-log.004712'…
jabley
  • 335
  • 3
  • 9
5
votes
2 answers

Is it possible to look up distances between AWS data centers?

My company is negotiating with a customer who has requirements for a minimum distance between data centers. Namely they require redundant storage in data centers more than 3 km apart. Is it possible to ensure this by using two different AZs in one…
5
votes
1 answer

DRBD as DR: syncing datastores of 2 ESXI hosts, vmdk consistency?

does anyone have experience with using DRBD (protocol C) to sync parts of the datastores of 2 esxi hosts for disaster recovery of selected guests? I have 2-3 guests that should be able to recover from hardware failure of the host in as little time…
1 2
3
23 24