0

We use Route53 for our private DNS, specifically the Multivalue Answer Routing Policy, with no Health Checks for dump/lazy load balancing.

This has worked well for us, up until yesterday where we're noticing odd behaviour.

We have a record set for ro.mysql.foo.bar.com, which contains the IPs of our read-only MySQL replicas.

If you run a DIG it reflects this;

dig A ro.mysql.foo.bar.com

;; QUESTION SECTION:
;ro.mysql.foo.bar.com. IN   A

;; ANSWER SECTION:
ro.mysql.foo.bar.com. 4 IN A    10.2.2.2
ro.mysql.foo.bar.com. 4 IN A    10.2.2.3
ro.mysql.foo.bar.com. 4 IN A    10.2.2.4

The expected behaviour is when mysql hits ro.mysql.foo.bar.com it connects to one of the 3 hosts at random, in a dumb/random load balanced kind of way. This worked fine up until yesterday.

If I run

mysql -h ro.mysql.foo.bar.com -e "select @@hostname;"

I'd expect the output of @@hostname to vary between 10.2.2.2, 10.2.2.3, and 10.2.2.4

Instead, it doesn't route traffic randomly across the three IPs, it will take one and only use one.

+-----------------------------------+
| @@hostname                        |
+-----------------------------------+
| 10.2.2.3                          |
+-----------------------------------+

If we remove 10.2.2.3 from the DNS will it stop using it. If we add it back in, it'll then continue to use only 10.2.2.3 again.

We do run our own resolver via PowerDNS (which we're phasing out), but doesn't appear to be any fault there that we can see, PowerDNS has had it's DNS cache purged too.

  • If you `dig @ns-xxxx.awsdns-yy.zzz your-record.example.com` where `ns-xxxx.awsdns-yy.zzz` is one of the name servers assigned to your hosted zone by Route 53, does the returned record ordering vary? Or, if this is a private hosted zone, the NS records are possibly dummy values, so try `dig @169.254.169.253 your-record.example.com` from inside the VPC. – Michael - sqlbot Mar 14 '18 at 11:32
  • @Michael-sqlbot Yes, the ordering varies. We've tested this on the AWS nameservers and our own, both show the same behaviour. – adamstrawson Mar 14 '18 at 11:34
  • So dig varies. Does `ping your-record.example.com` repeatedly try to ping only one host, or does that also vary, but only the `mysql` client doesn't? – Michael - sqlbot Mar 14 '18 at 11:36
  • @Michael-sqlbot ping shows the same behaviour too, returns the single IP and does not vary – adamstrawson Mar 14 '18 at 11:38
  • Initial impression, this sounds like it could be the work of my old nemesis, [`nscd`](https://serverfault.com/q/729738/153161). – Michael - sqlbot Mar 14 '18 at 11:48
  • @Michael-sqlbot I read similar things too, but we don't have `nscd` installed on our servers – adamstrawson Mar 14 '18 at 11:56
  • And this has been working all along, but suddenly changed? – Michael - sqlbot Mar 14 '18 at 11:58
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/74491/discussion-between-adamstrawson-and-michael-sqlbot). – adamstrawson Mar 14 '18 at 11:58

0 Answers0