Questions tagged [disaster-recovery]

Disaster recovery and preparedness is an unfortunate aspect of systems administration. This tag should be used for help with planning, implementation and best-practices related to recovering from a catastrophic event on a server or in a datacenter environment.

Recovering from an unplanned, catastrophic outage is a painful process whether you are managing a single server or an entire datacenter. Roof leaks, broken water lines, power outages and any number of other events can take what was a great day and turn it into a living nightmare when you are responsible for keeping systems others rely on available.

The key to recovering from any disaster is preparedness. Knowing the steps required to bring the network and systems back online is critical. Before one can properly prepare for a disaster it is necessary to understand the risks, bottlenecks and other critical components of the overall system, e.g. who controls the power, internet, etc at your site. Understanding the aspects of disaster recovery that are within ones control is a very important aspect when planning; if there is not someone on staff who can fix the power, HVAC, etc make sure that the contact info for someone who can is written down somewhere. Having a large amount of information available before a disaster occurs will help to keep everyone calm, cool and on-task when something actually does happen.

Once a risks are assessed and a plan is created, print out physical copies, email it, and make sure everyone with admin level access to the systems/datacenter has read and is familiar with them. The best plan in the world is worthless if it is on a system that is down and cannot be easily restored without following the plan. After everyone is familiar with the plan, practice when possible; in many situations it may not be realistic, but if possible take advantage of planned downtimes or natural outages to go through the recovery plan and refine it.

In summary, when a disaster happens:

  1. Don't Panic! Panic turns a debacle into a catastrophe every time.
  2. Plan ahead, understand the risks, and know what is within your control
  3. Follow the plan but be flexible, a recovery plan is more of a jazz tune than a military march
  4. Stay calm and organized, use check lists, keep notes
  5. If you are working in a team or group communicate and collaborate
  6. Be vigilant, update your plan as the environment changes
  7. Check your backups, make sure they happen at regular intervals and that the data contained therein is still good.
358 questions
0
votes
2 answers

how to repair ubuntu 8.04.4 after uninstalling many things by mistake

While trying to install rmagick and failing, I thought I would uninstall libfreetype6 and then reinstall it. When I entered sudo apt-get remove libfreetype6 It asked that the following packages will be removed, the list was long and included…
umar
  • 135
  • 1
  • 10
0
votes
1 answer

Disaster recovery advice

Possible Duplicate: Disaster recovery plan development best practicies or resources? We are in the process of starting an internal evaluation about the disaster recovery procedures for our datacenter. Can you suggest any good book/site that you…
0
votes
2 answers

nic not working after TEST sbs 2003 restore to dissimilar hardware

restored an sbs 2003 backup file to different hardware rebooted, and things took a while, i tidied up the drivers and such, removed ghost nic in device manager etc, re-ran the connection wizard for internet without issues. i hard coded the ip…
dasko
  • 1,244
  • 1
  • 22
  • 30
0
votes
3 answers

How likely can my data be recovered after Windows CHKDSK performed on a degraded RAID 5 array?

We have a RAID 5 setup with 3 SATA disks, #2 went down as reported on the pre-POST screen. Unfortunately, for some reasons beyond my control, the system was rebooted with a degraded RAID :-O Windows XP (64-bit) loaded, CHKDSK ran automatically and…
user41653
  • 73
  • 1
  • 9
0
votes
5 answers

Email continuity services i.e. Messagelabs - Caveats, lessons learned, gotchas?

I am in the process of reviewing some email continuity solutions such as the one offered by Messagelabs. Solutions such as this are not cheap, however, I believe they reduce complexity when it comes to administration and serves as a feasible DR type…
molecule
  • 83
  • 1
  • 4
  • 12
0
votes
1 answer

Is it possible to configure TMG to impersonate a domain user for anonymous requests to a website?

I would like to configure Forefront Threat Management Gateway (formerly ISA server) to impersonate a specific domain user for any anonymous request to a particular listener. For example, for any anonymous request to http://www.mycompany.com, I…
Daniel
  • 141
  • 1
  • 7
0
votes
2 answers

How do I cancel windows server 2003 repair install?

System: Windows 2003 Server Enterprise Scenario: NTDS db is corrupt and all attempts to fix with esentutl fail. Ran chkdsk which seemed to repair disk error and give access to the ntds.dit file but still esentutl fails. (Attached the drive to a…
0
votes
1 answer

vcentre re-install

I am just reviewing our dr documentation after we have moved over to a vmware vsphere environment. My question is surrounding the vcentre software and what is the best practice to re-install this? We have two VMware clusters (one at main office and…
peanutp
0
votes
1 answer

How to reroute from one subdomain to another in AWS Route53?

I have two hosted zones (main.mydomain.com and backup.mydomain.com). They include similar records (more than 100 records). Their records only differ in value/route traffic to. The main subdomain records route to my resources in my main AWS region…
0
votes
0 answers

ADO on premise recovery scenario (domain controller, app tier + db tier)

We are thinking about disaster recovery scenarios of one of our services and I am currently lacking some knowledge or sources to further investigate. We have an Azure DevOps On Premise server that has its application tier on a different machine than…
0
votes
1 answer

Multi-Master K8S cluster fails when half of the masters are down

i have a 4 masters HA K8S cluster , (accross 2 datacenters , 2 in each site) , but the kubectl command stops working after shuting down 2 masters , is this the expected behaviour? I want the cluster to survive a Datacenter crash. PS : I am using 2…
Kratozz
  • 3
  • 1
0
votes
1 answer

lvm recovery with raid reconfiguration

I have a Dell R510 with a H200 controller previously and I have two disks: a 2T SAS and a 250G SSD. I created two raid0s using those two disks on H200 and then created lvm on top of it. Here is my procedures: created a vg-data using entire raid0…
0
votes
1 answer

Disaster Recovery: PXE boot Windows Server 2019 from ISO

I have a dedicated physical Windows Server 2019 on the Ionos cloud platform. I need to implement a disaster recovery plan. I also have added a shared storage block on another Ionos server and mapped the shared storage segment as a network drive on…
0
votes
0 answers

Rescue Tip: Wrong library file installation damaged Ubuntu 18.04 after upgrade

There was an old Ubuntu 14.04 linode.com instance which I attempted to upgrade to 18.04. Some Dependencies where somehow not upgraded anyway. So I attempted to install corresponding deb file from Ubuntu server (as it was not in repo) The following…
TELA
  • 33
  • 5
0
votes
1 answer

If I image a virtual machine with SQL server installed will the database work on a new instance?

On AWS I've always imaged machines that contained SQL Server installations and running databases. When I spin up new instances from the images, SQL Server and all database always returned without issue (e.g. consistency checking always came back…
1 2 3
23
24