2

I'm running BIND 9.8.1-P1 caching server on Ubuntu 12.04 LTS on kernel 3.2.0-86-generic-pae.

I've been banging my head against the wall trying to figure this out for weeks and I just have no idea what is going on and am having a hard time even recreating the problem as it is very intermittent.

What is happening is that when loading a page such as www.msn.com in Firefox with Firefox's Network Monitor open I occasionally see a few requests that have the DNS resolution part of their timing anywhere from 8000ms to as high as 37000ms resolution times. Of course once I reload the page it is cached and then works fine but even if I flush the cache on both the server and the client suddenly that domain is no longer a problem. I haven't been able to see any pattern in which domains have an issue. I just have no idea how to even being troubleshooting this because it is so hard to replicate for the same domain twice.

Here is all I have in named.conf.options

options {
        directory "/var/cache/bind";

        // If there is a firewall between you and nameservers you want
        // to talk to, you may need to fix the firewall to allow multiple
        // ports to talk.  See http://www.kb.cert.org/vuls/id/800113

        // If your ISP provided one or more IP addresses for stable
        // nameservers, you probably want to use them as forwarders.
        // Uncomment the following block, and insert the addresses replacing
        // the all-0's placeholder.

        forwarders {
                //OpenDNS
                208.67.222.222;
                208.67.220.220;
        };

        auth-nxdomain no;    # conform to RFC1035
        //listen-on-v6 { any; };
};

I'm really having a hard time figuring out what is going on let alone how to fix it so I hope someone here can help.

As requested here is the log records generated when turning query logging on and then loading www.wsj.com. Only 4 requests has slow DNS responses

tags.bluekai.com    11986ms

Jul 30 17:50:14 fs named[23539]: client 192.168.16.24#55215: query: tags.bluekai.com IN A + (192.168.16.3)
Jul 30 17:50:15 fs named[23539]: client 192.168.16.24#55215: query: tags.bluekai.com IN A + (192.168.16.3)
Jul 30 17:50:16 fs named[23539]: client 192.168.16.24#55215: query: tags.bluekai.com IN A + (192.168.16.3)
Jul 30 17:50:18 fs named[23539]: client 192.168.16.24#55215: query: tags.bluekai.com IN A + (192.168.16.3)
Jul 30 17:50:22 fs named[23539]: client 192.168.16.24#55215: query: tags.bluekai.com IN A + (192.168.16.3)

comcluster.cxense.com   24004ms

Jul 30 17:38:32 fs named[23539]: client 192.168.16.24#52469: query: comcluster.cxense.com IN A + (192.168.16.3)
Jul 30 17:50:14 fs named[23539]: client 192.168.16.24#56490: query: comcluster.cxense.com IN A + (192.168.16.3)
Jul 30 17:50:14 fs named[23539]: client 192.168.16.24#54692: query: comcluster.cxense.com IN A + (192.168.16.3)
Jul 30 17:50:15 fs named[23539]: client 192.168.16.24#54692: query: comcluster.cxense.com IN A + (192.168.16.3)    
Jul 30 17:50:15 fs named[23539]: client 192.168.16.24#56490: query: comcluster.cxense.com IN A + (192.168.16.3)
Jul 30 17:50:16 fs named[23539]: client 192.168.16.24#54692: query: comcluster.cxense.com IN A + (192.168.16.3)    
Jul 30 17:50:16 fs named[23539]: client 192.168.16.24#56490: query: comcluster.cxense.com IN A + (192.168.16.3)
Jul 30 17:50:18 fs named[23539]: client 192.168.16.24#54692: query: comcluster.cxense.com IN A + (192.168.16.3)    
Jul 30 17:50:18 fs named[23539]: client 192.168.16.24#56490: query: comcluster.cxense.com IN A + (192.168.16.3)
Jul 30 17:50:22 fs named[23539]: client 192.168.16.24#56490: query: comcluster.cxense.com IN A + (192.168.16.3)    
Jul 30 17:50:22 fs named[23539]: client 192.168.16.24#54692: query: comcluster.cxense.com IN A + (192.168.16.3)
Jul 30 17:50:26 fs named[23539]: client 192.168.16.24#56963: query: comcluster.cxense.com IN A + (192.168.16.3)
Jul 30 17:50:27 fs named[23539]: client 192.168.16.24#56963: query: comcluster.cxense.com IN A + (192.168.16.3)
Jul 30 17:50:28 fs named[23539]: client 192.168.16.24#56963: query: comcluster.cxense.com IN A + (192.168.16.3)
Jul 30 17:50:28 fs named[23539]: client 192.168.16.24#56963: query: comcluster.cxense.com IN A + (192.168.16.3)
Jul 30 17:50:30 fs named[23539]: client 192.168.16.24#56963: query: comcluster.cxense.com IN A + (192.168.16.3)
Jul 30 17:50:34 fs named[23539]: client 192.168.16.24#56963: query: comcluster.cxense.com IN A + (192.168.16.3)

cdn.cxpublic.com    23509ms

Jul 30 17:50:26 fs named[23539]: client 192.168.16.24#57469: query: cdn.cxpublic.com IN A + (192.168.16.3)
Jul 30 17:50:27 fs named[23539]: client 192.168.16.24#57469: query: cdn.cxpublic.com IN A + (192.168.16.3)
Jul 30 17:50:28 fs named[23539]: client 192.168.16.24#57469: query: cdn.cxpublic.com IN A + (192.168.16.3)
Jul 30 17:50:30 fs named[23539]: client 192.168.16.24#57469: query: cdn.cxpublic.com IN A + (192.168.16.3)
Jul 30 17:50:34 fs named[23539]: client 192.168.16.24#57469: query: cdn.cxpublic.com IN A + (192.168.16.3)

www.facebook.com    33666ms

Jul 30 17:39:37 fs named[23539]: client 192.168.16.24#54063: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:26 fs named[23539]: client 192.168.16.24#63809: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:26 fs named[23539]: client 192.168.16.24#61280: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:27 fs named[23539]: client 192.168.16.24#63809: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:27 fs named[23539]: client 192.168.16.24#61280: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:28 fs named[23539]: client 192.168.16.24#63809: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:28 fs named[23539]: client 192.168.16.24#61280: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:30 fs named[23539]: client 192.168.16.24#63809: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:30 fs named[23539]: client 192.168.16.24#61280: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:34 fs named[23539]: client 192.168.16.24#63809: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:34 fs named[23539]: client 192.168.16.24#61280: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:38 fs named[23539]: client 192.168.16.24#64685: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:39 fs named[23539]: client 192.168.16.24#64685: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:40 fs named[23539]: client 192.168.16.24#64685: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:42 fs named[23539]: client 192.168.16.24#64685: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:46 fs named[23539]: client 192.168.16.24#64685: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:50 fs named[23539]: client 192.168.16.24#57121: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:51 fs named[23539]: client 192.168.16.24#57121: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:52 fs named[23539]: client 192.168.16.24#57121: query: www.facebook.com IN A + (192.168.16.3)
Jul 30 17:50:54 fs named[23539]: client 192.168.16.24#57121: query: www.facebook.com IN A + (192.168.16.3)
  • Did you enable query logging? `rndc querylog`, then check syslog. – Wouter Verhelst Jul 30 '15 at 10:53
  • @WouterVerhelst I enabled query logging for a minute and loaded up www.wsj.com in Firefox. In the network monitor there were only 4 requests out of maybe 100 that had a slow DNS response time. I'm listing them in my original question along with the corresponding records from the syslog – Brian LaPan Jul 30 '15 at 22:12
  • have you done a capture to see exactly where the pause is occuring? I suspect your issue is related to ipv6 lookups (not ipv6 traffic, just the lookups) – Rick Buford Jul 31 '15 at 12:44
  • @RickBuford You mean packet capture? I tried firing up Wireshark and capturing while I was loading cnn.com in Firefox but I'm afraid I'm not really sure what to look for as I've never really done any packet capturing before. Could you provide some guidance on how to find what you are looking for to confirm your suspicions as to the cause of they delays? – Brian LaPan Jul 31 '15 at 14:29
  • 1) make a note of one of the queries that seems to take a long time while the packet capture is running; 2) stop the capture and then put in the filter `dns.qry.name==""` without the `<>`; 3) look thru those packets to find conversations with longer delays between packets; 4) if you get real frisky, try some I/O graphs: https://ask.wireshark.org/questions/3678/dns-transaction-latency – Rick Buford Aug 01 '15 at 02:31

1 Answers1

0

Thank you to everyone who contributed to helping me solve this but I appear to have accidentally solved the problem. Replacing the router appears to have eliminated the problem entirely. I have done extensive testing loading various pages from the Alexa 500 in Firefox and I am no longer able to recreate the problem at all. The router I replaced was very old and I wasn't replacing it in an attempt to solve this problem but it appears that the old router was the source of the issue I was experiencing.

For the record the old router was a Netgear WGR614v5 and I replaced it with a TP-LINK Archer C7.