Slow west coast authoritative DNS responses from Google Cloud

Question

Our company uses Google Cloud Platform for hosting and DNS (we're not using GCP as a registrar). Recently we were looking at some metrics and saw slow DNS resolution, often over 100ms, was a significant contributor to our overall page load times. We setup a Datadog DNS synthetic and DNS resolution seems consistently slower from the west coast. The synthetic tests are run from within AWS and use 8.8.8.8 (Google Public DNS) and I see the same behavior by pointing the tests to 1.1.1.1 (Cloudflare DNS).

Ohio & Oregon: https://i.stack.imgur.com/5Sr8o.png
Virginia & California: https://i.stack.imgur.com/WSc8f.png

Also used DNS Perf: https://i.stack.imgur.com/fPLRw.jpg

The NS records given to us by Google Cloud are:

ns-cloud-b1.googledomains.com.
ns-cloud-b2.googledomains.com.
ns-cloud-b3.googledomains.com.
ns-cloud-b4.googledomains.com.

and we are using all of them in our registrar.

~ dig NS --.shop +short
ns-cloud-b4.googledomains.com.
ns-cloud-b1.googledomains.com.
ns-cloud-b2.googledomains.com.
ns-cloud-b3.googledomains.com.

I know that GCP discourages using ping/icmp because it's not necessarily representative of latency of other traffic, but the ping times from the west coast imply that the packets are going cross country:

PING ns-cloud-b1.googledomains.com (216.239.32.107): 56 data bytes
64 bytes from 216.239.32.107: icmp_seq=0 ttl=58 time=65.699 ms
64 bytes from 216.239.32.107: icmp_seq=1 ttl=58 time=67.458 ms
64 bytes from 216.239.32.107: icmp_seq=2 ttl=58 time=66.873 ms

PING ns-cloud-b2.googledomains.com (216.239.34.107): 56 data bytes
64 bytes from 216.239.34.107: icmp_seq=0 ttl=58 time=85.820 ms
64 bytes from 216.239.34.107: icmp_seq=1 ttl=58 time=87.567 ms
64 bytes from 216.239.34.107: icmp_seq=2 ttl=58 time=84.580 ms

I also confirmed that this latency exists with the cogent looking glass traceroute/ping: https://www.cogentco.com/en/looking-glass. The GCP docs say:

Your users will have reliable, low-latency access from anywhere in the world using our anycast name servers.

but the performance we're seeing seems like our DNS queries are being served from a central location. We are using the .shop TLD, but I saw similar performance for another URL using the .app TLD, so the problem doesn't seem to be the TLD DNS servers.

Extra Data

The latency we're concerned with is from our user's devices over the public internet, but to remove ISP differences, here is some more data on dns query latency from VMs within GCP. The latency isn't terrible, but each location has decently different speeds and from within GCP I would expect all the name servers to be quick (<25ms).

# us-central1
alex@alex-1-central:~$ dig @ns-cloud-b1.googledomains.com --.shop | grep time
;; Query time: 16 msec
alex@alex-1-central:~$ dig @ns-cloud-b2.googledomains.com --.shop | grep time
;; Query time: 28 msec
alex@alex-1-central:~$ dig @ns-cloud-b3.googledomains.com --.shop | grep time
;; Query time: 20 msec
alex@alex-1-central:~$ dig @ns-cloud-b4.googledomains.com --.shop | grep time
;; Query time: 0 msec

# us-west2
alex@alex-1-west2:~$ dig @ns-cloud-b1.googledomains.com --.shop | grep time
;; Query time: 40 msec
alex@alex-1-west2:~$ dig @ns-cloud-b2.googledomains.com --.shop | grep time
;; Query time: 60 msec
alex@alex-1-west2:~$ dig @ns-cloud-b3.googledomains.com --.shop | grep time
;; Query time: 52 msec
alex@alex-1-west2:~$ dig @ns-cloud-b4.googledomains.com --.shop | grep time
;; Query time: 48 msec

# us-east1
alex@alex-1-east:~$ dig @ns-cloud-b1.googledomains.com --.shop | grep time
;; Query time: 28 msec
alex@alex-1-east:~$ dig @ns-cloud-b2.googledomains.com --.shop | grep time
;; Query time: 4 msec
alex@alex-1-east:~$ dig @ns-cloud-b3.googledomains.com --.shop | grep time
;; Query time: 12 msec
alex@alex-1-east:~$ dig @ns-cloud-b4.googledomains.com --.shop | grep time
;; Query time: 32 msec

I also setup a test zone in AWS Route53 and queried the authoritative nameservers directly from GCP VMs (just like I did above for GCP) and got <20ms response times from each location.

Hi, can you share where you are trying to perform the latency test? Would that be outside GCP resource to one of the cloud dns nameserver or would that be inside GCP resource? — Yvan G., Mar 15 '23 at 23:50
DNS latency will depend on where you measure it. One reliable way is to measure from GCP resources. In this case, If you are using GCP cloud DNS and make your query from GCP VM, your request will remain in Google’s internal network resulting in lower network latency. But if your queries are generated outside of GCP, for example from your local network, then you may experience higher latency as it will travel over ISP’s network. In this case, the latency will depend on each ISP and their connections to GCP. — Yvan G., Mar 15 '23 at 23:50
Which DNS servers are you sending your requests to (directly to GCP DNS) or your ISP's DNS servers? DNS requests are cached. It is rare that a customer will directly make requests to your domain's DNS servers. Instead, customers would make requests to their ISP's DNS servers or a global service such as Google DNS (not the same thing as Google Cloud DNS). Edit your question with details on what you are measuring and from where/to. Note: your question included screenshots without data, which makes them sort of useless. — John Hanley, Mar 16 '23 at 00:24
Thanks @YvanG. I've added an extra data section with latencies from GCP VMs in various regions to each of the GCP Cloud DNS name servers. It looks like there's a decent amount of variability between regions and nameservers. Maybe this is expected, but I would've expected to see a lot more consistency from within GCP. — Alex H, Mar 16 '23 at 01:07
And thanks @JohnHanley, I also edited the question and clarified that the Datadog synthetics are being run from within AWS and pointing to 8.8.8.8 (but I've seen the same problem pointing at 1.1.1.1). — Alex H, Mar 16 '23 at 01:08
@AlexH - If you send DNS queries to Google or Cloudflare, then you are measuring their DNS server's response time and not the response time of your Google Cloud DNS servers. — John Hanley, Mar 16 '23 at 01:16
@AlexH - Trying to measure this is almost pointless because even if you can identify something, that something will vary based on time. DNS name resolution is a global collection of systems all working together. Some systems will cache answers if your service has high global demand, otherwise, a refresh is required. You have no control over the DNS servers or resolvers that your clients configure. In the real world, DNS name resolution will have little impact on most services because once the client resolves the DNS name, connections are made using the IP address and not the DNS name. — John Hanley, Mar 16 '23 at 01:26
@JohnHanley, I understand that there may not be a ton of value in this, but I do think it's very surprising that users on the west coast have double or triple the latency. It seems likely to me that the higher latency is related to the fact that the query time between a GCP VM in us-west2 and each nameserver is double/triple the query time between a GCP VM in us-central or us-east and the nameservers (shown in the extra data section). Especially b/c I set up a test zone in Route53 and queried those authoritative nameservers from the same GCP VMs and got query times of <20ms from each one. — Alex H, Mar 16 '23 at 06:45
@AlexH - I am not saying there is no value. I am saying you cannot control what you are measuring. Take a look at how DNS is organized globally. It is not a point-to-point system. If your servers are HTTP, perform HTTP tests. DNS has little effect on real-world performance. DNS reliability is far more important. — John Hanley, Mar 16 '23 at 07:00
@JohnHanley, we discovered this problem by running synthetic HTTP tests and seeing that DNS resolution was taking a larger % of the total time than we were comfortable with. We'll likely run a test with Route53 and see if the metrics are meaningfully improved, thanks for your input. — Alex H, Mar 16 '23 at 19:19
From the dns tag: DNS QUESTIONS MUST BE PROGRAMMING RELATED. Use this tag for programming questions related to writing code that interacts with the Domain Name System (DNS); for example, writing code that uses gethostbyname() — Rob, Mar 18 '23 at 15:59
The tag does not make your question off topic. Questions about dns are off topic. Please delete this. — Rob, Mar 19 '23 at 11:31

score 0 · Answer 1 · answered Mar 16 '23 at 17:19

DNS latency will depend on where you measure it. One reliable way is to measure from GCP resources. In this case, If you are using GCP cloud DNS and make your query from GCP VM, your request will remain in Google’s internal network resulting in lower network latency. But if your queries are generated outside of GCP, for example from your local network, then you may experience higher latency as it will travel over ISP’s network. In this case, the latency will depend on each ISP and their connections to GCP

As I mentioned above, I have added data showing that DNS resolution, even from a GCP VM, is slower than I'd expect from the west coast. — Alex H, Mar 16 '23 at 18:11

Slow west coast authoritative DNS responses from Google Cloud

Extra Data

1 Answers1