1

We are doing API calls to Docusign, which fail occasionally with "getaddrinfo: Name or service not known" errors. Investigating further, we see that when we connect, name resolution fails sometimes, but only from our West datacenter location. Seems the GLB DNS for the US West can take a very long time to resolve, causing DNS client timeouts when it takes >10s to look up the address.

$ dig @1.1.1.1 www.docusign.net

; <<>> DiG 9.9.5-3ubuntu0.8-Ubuntu <<>> @1.1.1.1 www.docusign.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49468
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1452
;; QUESTION SECTION:
;www.docusign.net.              IN      A

;; ANSWER SECTION:
www.docusign.net.       22      IN      CNAME   www-geo.docusign.net.akadns.net.
www-geo.docusign.net.akadns.net. 22 IN  CNAME   www-west.docusign.net.akadns.net.
www-west.docusign.net.akadns.net. 22 IN A       162.248.184.27

;; Query time: 1 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Thu Oct 04 13:16:43 EDT 2018
;; MSG SIZE  rcvd: 126

Above is a good result, which took 1msec (cached)

$ dig @1.1.1.1 www.docusign.net

; <<>> DiG 9.9.5-3ubuntu0.8-Ubuntu <<>> @1.1.1.1 www.docusign.net
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21193
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1452
;; QUESTION SECTION:
;www.docusign.net.              IN      A

;; ANSWER SECTION:
www.docusign.net.       6       IN      CNAME   www-geo.docusign.net.akadns.net.
www-geo.docusign.net.akadns.net. 6 IN   CNAME   www-west.docusign.net.akadns.net.
www-west.docusign.net.akadns.net. 6 IN  A       162.248.184.27

;; Query time: 2725 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Thu Oct 04 13:21:29 EDT 2018
;; MSG SIZE  rcvd: 126

This one is worse, as it took nearly 3s. During testing we've seen this go over 12s which will time out a lot of DNS clients and requesting apps.

Since the TTL is set to 30s, that means that every 30 seconds we have a chance at getting a timeout, our app generating errors, then a DNS success results in resumption of service. Unfortunately, this shows up as an error to our customers in our app.

We're able to work around this using hacks, but am curious if anyone else is seeing this, and how you've worked around it. Also, it might be good for people at docusign/akamai to look into why the performance of the www-west.docusign.net.akadns.net record is so bad.

Patrick Mevzek
  • 10,995
  • 16
  • 38
  • 54
think410
  • 11
  • 1
  • Aside, there are warnings because of missing glues: http://dnsviz.net/d/www.docusign.net/W7Z7KA/dnssec/ – Patrick Mevzek Oct 04 '18 at 20:43
  • Did you try with other recursive public nameservers: `8.8.8.8`, `9.9.9.9` or `80.80.80.80`? try also with and without TCP (`+tcp`/`+notcp` options) – Patrick Mevzek Oct 04 '18 at 20:44
  • We had tried with a few different external public DNS (8.8.8.8 and 8.8.4.4) with the same results. Things look a lot better today; since this has been happening for ~1.5 months, I'm wondering if something hasn't changed and been fixed somewhere. Note, this only happened when your GeoIP location pointed you to the www-west CNAME, the www-na (which dnsviz gets) record was fine all along. – think410 Oct 05 '18 at 17:11
  • From my location I currently get the `www-east` one :-) – Patrick Mevzek Oct 05 '18 at 23:11

0 Answers0