4

I'm curious about what are the differences and considerations when implementing email server load balacing between these three methods:

  1. Multiple MX records with the same priority/preference number
  2. Multiple A and/or AAAA records for the domain name which the MX record(s) point to
  3. Using an anycast IP for the IP(s) in the MX record(s) or the A/AAAA record(s)

Which single method or combination of these is considered the best practice and when should you use them?

Are there any other ways to load balance email servers besides these and using an email gateway?

In my experience, I see 1 and 2 used quite often, but 3 not so much, mostly for CDNs.

Stuggi
  • 3,506
  • 4
  • 19
  • 36
  • Servers will try all MX records but only a single A record. I would argue using multiple MX records is the best practice as its designed into the SMTP spec. – davidgo Jun 11 '20 at 12:25
  • 1
    @davidgo "Servers will try all MX records but only a single A record. " REally? No. Servers will try all MX records of the same priority first, starting with lower ones and then will go to other ones only if it has failed. As for `A`/`AAAA`, all sane applications get all the possible values and try all of them one by one until reaching a working one. " I would argue using multiple MX records is the best practice as its designed into the SMTP spec. " IT is more designed for fault tolerance than for load balancing. `SRV` records instead are designed for both. – Patrick Mevzek Jun 11 '20 at 14:55
  • @PatrickMevzek Agree re MX records being tried in groups based on priority - but one at a time in a random fashion - the OP stated "of the same priority" - but the order of these is randomised as required by RFC 5321, section 5.1, paragraph 7. This same paragraph specifically states "the sender-SMTP MUST randomize them to **spread the load** ..." SRV records are not part of the SMTP spec and are not useful in answering the questions. (I'll address A records seperately) – davidgo Jun 11 '20 at 23:50
  • @PatrickMevzek I concede that according to the same RFC (paragraph 5) multiple A records should be tried if available. I confess I learnt quite a bit about A records this morning which I would have thought I'd have known. Thank you! – davidgo Jun 12 '20 at 00:06
  • I gave the `SRV` example to illustrate that, contrary to `MX` records that were mostly designed just for fail over, `SRV` records were designed for both fail over and load balancing from the get go. Which implies in a way that you typically do not do load balancing through `MX` records while you technically can. And you don't do load balancing through `A`/`AAAA` records either because the problem is that clients gets records in random order, typically pick on, try it, and then go to second one only if problems (or more elaborate for happy eyeballs algorithm). – Patrick Mevzek Jun 12 '20 at 00:27
  • (and to be pedantic, `MX` records are not mandatory for SMTP as there is a fallback to `A` records by design; but it is better to use them) – Patrick Mevzek Jun 12 '20 at 00:28

1 Answers1

1

This is a subjective question. And also depends on the infrastructure you are in control of.

With MX records you can channel your destination only a bit. The randomization is left out to the DNS or even the client. They should spread it evenly but they also sometime cache the randomization for future requests. So you rely on the client side or intermediate DNS that they hopefully load balance for you.

The same goes for the A records. You do not control how the client side interprets the DNS answer. Will they behave as you expect? Do they follow the RFCs? Do they randomize or always start with the first one? You are not in control.

If you have multiple servers behind one IP in a HA cluster, then you are in control. You know how to load balance the incoming traffic, based on load or other criteria. But your bottleneck will be the bandwidth to the border machine which distributes the traffic.

So you can combine everything as you like. Do you need multiple data centers to receive mail in case of a failure, then one IP is not enough. Do you control the DNS and can give realtime geographical-local answers to a data center near the sender? Can you live with semi load balanced traffic by the sender?

Look at Google. They give you five MX (at the time of writing) but everyone has a different priority. Each MX has only one A/AAAA. But the answers are short lived (TTL), so that "everytime" you request the IP you get a different one. And I'll bet you that behind every IP there are dozens of real servers answering the clients. And I would also guess that these answers are geocoded to reach a data center on at least the same continent. You probalby know that Google has thousands of servers handling billions or trillions of mails per day. Definitively not with five MXs with increasing priorities. And I will not argue with them. Their staff has a higher pay grade than me.

mailq
  • 17,023
  • 2
  • 37
  • 69