Cost reasonable failover architecture between 2 datacenters

Question

I would like to get your thoughts on an architecture to handle failover between 2 application servers hosting multiple applications (.NET based websites, SQL Server) in Switzerland. The goal is to have a limited downtime in case of failure to switch over secondary server (< 2 hours), reasonable costs and limit human interactions to recover.

Remarks :

DNS failover is not applicable for us as we don't manage DNS for each application hosted on our application server.
We would like to avoid hosting our application server on managed VM to be able to manage licences, VM transfer and have high I/O performance.

At this stage my plan would be to use following setup :

Nginx hosted on Azure Switzerland (F2sv2 - 2 vCPU(s) - 4 GB RAM)
Server A : Dell R6515 Dedicated server in Switzerland a Tier IV datacenter (active instance - to buy)
Server B : Dell R6515 Dedicated / development server in our own office infrastructure in Switzerland (1Gb/s connection) (backup instance - already buyed)

Nginx (80 Eur/month - no cost as MS Partner)

Nginx main goal is to allow us to switch traffic from Server A IP to Server B IP in case of a failure. All the application services DNS will point to the Nginx IP. As it's hosted Azure it should be redundant by design if the Nginx itself fails. Maybe they are some features to ensure redundancy on Azure for Nginx.

VMWare Essential (~ 600 Eur)

We currently own a WMWare essential licence which allows us to run up to three ESXi hosts. Server A and Server B will be ESXi instances registered under VSphere. Server A will host the active VM containing our application server.

Veeam Community Edition (free)

We plan to use Veeam Community Edition to replicate the application server VM image + SQL Server transaction logs from Server A to Server B.

Server A Hardware / Datacenter Failure

So far, I imagine following procedure should be performed in case of failure of hardware or datacenter failure at Server A : 1. Configure Nginx for maintenance page 2. Restore application server VM Image on Server B using Veeam 3. Restore transaction logs on Server B using Veeam 4. Change Nginx configuration to redirect traffic from Server B to Server A. 5. When Server A is available again, repeat procedure the other way from Server B to Server A

As I am no infrastructure / WMWare expert, I would love to have your thoughts on this architecture or any proposal which would help us to achieve our goal. At this stage I am asking myself what would be the latency/performance impact to make every request beeing redirected from the Nginx to the application center even if both are very near in terms of location.

Thank you for your advices !

Gilles

Any inexpensive changeover mechanisms where servers are on different networks will require either access to DNS or the ability to easily reprogram the hosts or have them automatically keep trying between 2 or more servers. (The alternative is dynamic routing which means BGP, expensive network setup and management). What's your poison? — davidgo, Mar 28 '20 at 02:03

Zac67 · Answer 1 · 2020-03-27T13:16:29.910

3

I wouldn't exactly call this concept "cost reasonable", more "avoid cost"...

Assuming you're using nginx as proxy, the requests aren't redirected there - but flow through that server. That may quickly become a bottleneck and delays responses in general. Additionally, you're 100% depending on the Azure cloud - that may be cheap but an infrastructure of your own can likely provide better availability. Double check the Azure SLA and the fine print!

Also, I wouldn't want to rely on replicating SQL data through whole-VM replication - you should consider (continuously) replicating data on the SQL level as well, reducing the window for data loss.

edited Mar 27 '20 at 13:16

answered Mar 27 '20 at 12:58

Zac67

10,320
2
12
32

Actually azure is likely the better solution for UPTIME - but i doubt there are no traffic costs at some point. But yes, the SQL server high availability is "you are fired" level bad - never replicate vm's when yo can handle that within SQL Server. – TomTom Mar 27 '20 at 13:19
Thanks for your answers. Regarding Azure, SLA is 99.9% for the VM planned. I feel uncomfortable regarding potential bottleneck coming from Azure using nginx... What would be my alternative options for failover between my two DC ? – Gilles F Mar 27 '20 at 14:34
Regarding Veeam it seems to offer consistent sql transaction log backup aside of the VM image through the Veeam availability suite (https://www.veeam.com/blog/how-to-back-up-a-sql-server-transaction-log.html). It works as sql transaction log backup running directly on SQL Server. Using Veeam for sql backup would allow us to have a single backup and recovery management point of entry. – Gilles F Mar 27 '20 at 14:35
@GillesF 99.9% availability means up to .1% downtime, per year that's 8:45 hours - clearly outside your 1 hour window. Re Veeam application-awareness: they're talking about *backup* and SQL transaction integrity. I was referring to *tight* replication within SQL - with a very small loss window - which you likely won't be able to achieve by full guest replication. Veeam is no bad choice but it might not entirely cover your data loss/integrity requirements (which you haven't detailed btw). – Zac67 Mar 27 '20 at 16:40

Cost reasonable failover architecture between 2 datacenters

1 Answers1