Questions tagged [disaster-recovery]

Disaster recovery and preparedness is an unfortunate aspect of systems administration. This tag should be used for help with planning, implementation and best-practices related to recovering from a catastrophic event on a server or in a datacenter environment.

Recovering from an unplanned, catastrophic outage is a painful process whether you are managing a single server or an entire datacenter. Roof leaks, broken water lines, power outages and any number of other events can take what was a great day and turn it into a living nightmare when you are responsible for keeping systems others rely on available.

The key to recovering from any disaster is preparedness. Knowing the steps required to bring the network and systems back online is critical. Before one can properly prepare for a disaster it is necessary to understand the risks, bottlenecks and other critical components of the overall system, e.g. who controls the power, internet, etc at your site. Understanding the aspects of disaster recovery that are within ones control is a very important aspect when planning; if there is not someone on staff who can fix the power, HVAC, etc make sure that the contact info for someone who can is written down somewhere. Having a large amount of information available before a disaster occurs will help to keep everyone calm, cool and on-task when something actually does happen.

Once a risks are assessed and a plan is created, print out physical copies, email it, and make sure everyone with admin level access to the systems/datacenter has read and is familiar with them. The best plan in the world is worthless if it is on a system that is down and cannot be easily restored without following the plan. After everyone is familiar with the plan, practice when possible; in many situations it may not be realistic, but if possible take advantage of planned downtimes or natural outages to go through the recovery plan and refine it.

In summary, when a disaster happens:

  1. Don't Panic! Panic turns a debacle into a catastrophe every time.
  2. Plan ahead, understand the risks, and know what is within your control
  3. Follow the plan but be flexible, a recovery plan is more of a jazz tune than a military march
  4. Stay calm and organized, use check lists, keep notes
  5. If you are working in a team or group communicate and collaborate
  6. Be vigilant, update your plan as the environment changes
  7. Check your backups, make sure they happen at regular intervals and that the data contained therein is still good.
358 questions
1
vote
3 answers

Mysql server crashes without leaving any log behind

I've a mysql server (v5.6) running on ubuntu server 10 x64. It's always up and running with a medium traffic, but every while it crashes & restarts without lefting any log message behind, after restarting, it begins crash recovery which usually…
Ehsan Khodarahmi
  • 305
  • 1
  • 7
  • 18
1
vote
1 answer

create bootable ISO from USB flash drive recovery media with imgburn or other tool

I need clean my HDD and partitions but first did a recovery usb drive with the recovery tool, the recovery is one of Thinkpad E431 and was done right. I cant rely only on a fragile usb drive to keep this copy and want keep a bkp on my external hdd…
Karl Moser
  • 11
  • 3
1
vote
1 answer

Can Azure Backup and ASR protect the same workload?

I plan to use both Azure Site Recovery for high availability & DR of on-prem workloads replicated on Azure & Azure Backup for safety & recoverability of data related to the same set of workloads. I came across this statement online (from September…
mvark
  • 207
  • 1
  • 11
1
vote
0 answers

Recovering MySQL data from files on Windows

I had a server fail (disk failure, old server; no raid) at the weekend and am unable to recover the MySQL databases from the backup. After exhausting all the usual recovery options, I've performed a disk recovery and have managed to retrieve the…
1
vote
2 answers

RAID 5 RECONSTRUCT with RAID Reconstructor

I have Dell Poweredge server 2600 with Raid 5 in 3 hard drive Scsi 36gb each, it was fail to boot sinc the third drive is offline. I attached Sata card adapter to Sata hard drive and install OS SERVER 2003 to it, downloaded drivers for Raid and…
user22914
  • 11
  • 1
  • 2
1
vote
2 answers

Azure-to-Azure Disaster Recovery orchestration

I have been looking into this lately to find no available solution for Azure-to-Azure disaster recovery solution that is native. Do you people think that it is completely fair to trust the Azure cloud to take care of the risks that a disaster brings…
1
vote
2 answers

DR - Server 2003 Std SP2 Restore AD to different hardware

What I'm trying to accomplish might be the best way to start this. I have to do a DR test. I'm given 2 days to rebuild AD, Fileserver and SQL from scratch from Dell/IBM hardware to HP Server DL380 Server. What I started out doing was building a…
David Gargan
1
vote
2 answers

How to recover an Ubuntu Server which has partial permanent data loss?

I have an Ubuntu Server which root partition is on a raid 0 were one HDD had data loss / broken sectors. The data loss only impacts 1-2 % of the total data on that root partition however the server either doesn't boot correctly and only starts…
Jey DWork
  • 187
  • 1
  • 3
  • 11
1
vote
1 answer

How much memory is required to send deduplicated ZFS stream?

Last year I set up a pair of servers for my employer, running FreeBSD 10.1 with a large pool of storage in each server. 12 x 2TB disks, in a zpool configured as two raidz2 vdevs of six disks each. One of these servers is a standby and is a replica…
William S.
  • 328
  • 2
  • 12
1
vote
1 answer

Replay/Truncate log files exchange 2010 without having ever made a backup

I have an exchange 2010 server where the domain controller/DNS has crashed and is beyond recovery (No backup, dont ask). The Exchange server is still alive, but management and such cant load. I have access to the .edb file and logs. I would like to…
Rasmus
  • 53
  • 1
  • 8
1
vote
0 answers

Zimbra 7.1.4 disaster recovery within the LDAP service

We've a disaster in our Zimbra 7.1.4 server. The server is (was) running on a CentOS 6 box and it failed on the last friday. After doing a lot of effort to make the machine boot once again Zimbra fails to start with a lot of errors. The first one…
Vinícius Ferrão
  • 5,520
  • 11
  • 55
  • 95
1
vote
1 answer

SQL Server Sync Jobs Betwen Primary and Secondary Server

I am setting up a Standby Server for disaster recovery and high availability reasons. My product relies heavily on SQL Jobs, as such, these jobs need to exist (Disabled) on the Standby server. I have the database replication setup, however I can't…
1
vote
3 answers

Replicate EMC LUN to AWS. Is it possible?

Is it possible to replicate a LUN from our EMC VNX5300 to Amazon Web Services? This would be for a disaster recovery scenario. I've talked to AWS and they said to talk to EMC. I've talked to several EMC employees one of them being an engineer. None…
travmi
  • 11
  • 5
1
vote
1 answer

Using duplicity in a disaster recovery scenario

I am using duplicity for backups on my Debian servers. I have successfully backed up and restored files using duplicity. In my backup script, I have /usr/bin/dpkg --get-selections > /installed_packages_ so that I can backup the package…
AWippler
  • 1,065
  • 1
  • 12
  • 32
1
vote
4 answers

SQL Server DR Strategies with virtualisation & multi sites

I'm looking for advice on the best HA/DR strategies for SQL Server. Currently I'm using Express edition with backups being copied over a the WAN to a remote site and being restored. The SQL Express instances are hosted on a virutalised server…