We are thinking about disaster recovery scenarios of one of our services and I am currently lacking some knowledge or sources to further investigate.
We have an Azure DevOps On Premise server that has its application tier on a different machine than the db tier.
Production network
========
app1
db1
To be able to authenticate app1
against db1
we follow MS recommended way of joining these machines in a domain (which appears to be the only way). Now the setup is
Production network
========
app1
db1
domaincontroller
If there is a data loss and recovery scenario, there are two situations (from the point of view of the ADO server): full and partial recovery.
A full recovery of app1
and db1
is trivial. You restore the last VM snapshots of app1
and db1
(or use ADOs internal database backup and restore). Optionally lose some data, but as long as the domain controller is running, everything else is fine.
Now the thing I cannot wrap my head around is: What happens if there is partial data loss on app1
and db1
. Both servers are working for most users but some experienced data loss and need recovery. Say we do not want to interrupt most users and will recover work items or source code without history. Is this even possible?
We cannot restore an older snapshot of app1
and db1
in the production network because this would impact all users not affected by the original data loss.
So IMHO the recovery scenario would require to restore a snapshot of app1
and db1
in an isolated environment.
Production network
=================
app1
db1
domaincontroller
Isolated network
=================
app_restored
db_restored
But do I actually also need to restore the domain controller into the isolated network as well? Since app tier and db tier need a domain controller to communicate, there is is really no way around restoring a domain controller as well, am I right? (I actually don't know if restoring a domain controller in an isolated network is easy or even possible)
This seems to be a very important constraint (if true) that effectively limits the ability to restore services that way.