I intend to create a SaaS with two Load Balancers. If one load balancer goes down, 'MySaaSApp.com' will point to the other load balancer. Can such be accomplished through DNS Records alone? Thanks!
2 Answers
No. Or not in general.
The DNS does load balancing by default, not fail over. If you have multiple data for a given name and record type, applications are supposed to use all of that data. But since they are sets, not lists, there is no guarantee of order. And records are cached, so through all this applications do not have an "immediate" way to fail over from a broken case to a working one.
But they are exceptions. MX
records are ordered through a priority field, so applications (that is SMTP clients or MUA/MTAs) are expected to automatically switch to second record if first one fails, and so on. But this is because it is backed in the specifications of the MX
record, and because it has a specific piece of information in the record, the priority.
Same for SRV
records that allows to put in place load balancing and fail over... as long as applications are developed to use those records, which for example is not the case with web browsers, for some now historical reasons, which ought to be solved in the future with the new HTTPS
and SVCB
record types.
Load balancers in hot/hot or hot/cold configurations have specific protocols to resolve those issues. You typically use either IP anycast with some clever BGP routing, or some "virtual" IP address that is floating between various systems, which needs to be synchronized to make sure the IP address is claimed only by one at any single point in time. Look at VRRP for example, or at https://www.haproxy.com/blog/failover-and-worst-case-management-with-haproxy/ for an example when using HAProxy.

- 9,921
- 7
- 32
- 43
To counter Patrick’s answer ; although the DNS protocol has no native fail over mechanism quite a few DNS providers do support changing DNS records by among others tying a health check to a record.
Then you can configure that example.com resolves to example.com 60 IN A 10.0.0.1
as long as a web requests to https://10.0.0.1/healthcheck
result in a 200 response.
When it doesn’t; then fall back to example.com 60 IN A 10.9.8.1
or similar.
Note that DNS records must come with a time-to-live TTL value (60 seconds in the example above) that governs how long resolvers should cache the record. The shorter the TTL the faster cached records expire and the faster the failover should occur but also the more load on your name server and some additional latency.
And be aware that not every resolver properly honors TTL records meaning that in case of fail over some of your users will be resolving to the old IP-address for much longer then you design for...
See for example https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resource-record-sets-values-failover.html

- 5,805
- 7
- 25
-
"quite a few DNS providers do support changing DNS records by among others tying a health check to a record." So this is something outside of the DNS. Some process monitor things and then changes the records. Of course that is possible, but it would be possible no matter which protocol is used, DNS or not, so it is not an intrinsec property of the protocol. – Patrick Mevzek Mar 31 '21 at 15:10
-
" The shorter the TTL the faster cached records expire" Life is unfortunately not as simple. Some may believe you can use then TTLs as short as 1 second to emulate fail over but it doesn't work for at least two reasons: 1) various resolvers will refuse such low values, and basically below 5 minutes you can not count on TTLs being honored (as violating the standard as it is, it is what it happens) and 2) applications have to be programmed specially to make sure to request the resolution again, some will do the resolution on start and never later except if forced so – Patrick Mevzek Mar 31 '21 at 15:12
-
Thank you for reiterating the points I already made. And despite of all those valid remarks; ***in practice*** and for many use cases and users ***this does work*** remarkably well. – Bob Mar 31 '21 at 20:53
-
"this does work remarkably well." if you are happy with delays and difficult troubleshooting, yes, most certainly, plus a need to adapt all applications. But again, this is still neither how the DNS was designed, nor how it works. For example it was designed for fail-over on `NS` records as resolvers are expected to switch to another nameserver if one doesn't reply. This is not an emerging property of the protocol, but a specific feature of a specific record type, `NS`. As I said in my answer, some other record types have similar features, but it is per record type, not at the protocol level. – Patrick Mevzek Mar 31 '21 at 22:28