0

I have been wracking my brain trying to resolve an issue using bind for DNS resolution in centos

The setup I have is not typical (this was inherited).

Basically on the server there is a namespace called gi, here is where named service is been used by a new service call srv-gi '''

#!/bin/sh

start_service() {
        ip netns exec gi /usr/sbin/zebra -d -A 127.0.0.1 -f /etc/quagga/zebra.conf
        ip netns exec gi /usr/sbin/bgpd -d -A 127.0.0.1 -f /etc/quagga/bgpd.conf 
        #DNS service
        ip netns exec gi  /usr/sbin/named -u named -c /etc/gi-named.conf
}

start_service

'''

The named.conf file has also been renamed to gi-named.conf file.

// // named.conf // // Provided by Red Hat bind package to configure the ISC BIND named(8) DNS // server as a caching only nameserver (as a localhost DNS resolver only). // // See /usr/share/doc/bind*/sample/ for example named configuration files. // // See the BIND Administrator's Reference Manual (ARM) for details about the // configuration located in /usr/share/doc/bind-{version}/Bv9ARM.html

options {
        listen-on port 53 { Public IP; };
        #listen-on-v6 port 53 { ::1; };
        directory       "/var/named";
        dump-file       "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named_stats.txt";
        memstatistics-file "/var/named/data/named_mem_stats.txt";
        recursing-file  "/var/named/data/named.recursing";
        secroots-file   "/var/named/data/named.secroots";
        allow-query     { any; };
        allow-query-on  { PublicIP; };

        /*
         - If you are building an AUTHORITATIVE DNS server, do NOT enable recursion.
         - If you are building a RECURSIVE (caching) DNS server, you need to enable
           recursion.
         - If your recursive DNS server has a public IP address, you MUST enable access
           control to limit queries to your legitimate users. Failing to do so will
           cause your server to become part of large scale DNS amplification
           attacks. Implementing BCP38 within your network would greatly
           reduce such attack surface
        */
        recursion yes;
        allow-query-cache { Internal Range; };
        allow-query-cache-on  { PublicIP; };



        query-source address Public IP ;

        dnssec-enable yes;
        dnssec-validation yes;

        /* Path to ISC DLV key */
        bindkeys-file "/etc/named.iscdlv.key";

        managed-keys-directory "/var/named/dynamic";

        pid-file "/run/named/named.pid";
        session-keyfile "/run/named/session.key";
};


logging
{
/*      If you want to enable debugging, eg. using the 'rndc trace' command,
 *      named will try to write the 'named.run' file in the $directory (/var/named).
 *      By default, SELinux policy does not allow named to modify the /var/named directory,
 *      so put the default debug log file in data/ :
 */
        /*channel default_debug {
                print-time yes;
                print-category yes;
                print-severity yes;
                file "data/named.run";
                severity dynamic;
        };*/
        channel queries_log {
                file "/var/log/queries" versions 1 size 20m;
                print-time yes;
                print-category yes;
                print-severity yes;
                severity debug 3;
        };

        category queries { queries_log; };
        category client { queries_log;  };
};

zone "." IN {
        type hint;
        file "named.ca";
};

include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";

Also to note i have a quagga riuter configured to allow DNS resolution via Public IP

/etc/quagga/bgpd.conf

!
! Zebra configuration saved from vty
!   2019/10/11 10:11:45
!
!
router bgp AS
 bgp router-id PublicIP
 network PublicIP/32
 network CoreIP/32
 neighbor DUB1-WGW peer-group
 neighbor DUB1-WGW remote-as AS
 neighbor DUB1-WGW soft-reconfiguration inbound
 neighbor DUB1-WGW route-map XXXXX out
 neighbor CoreBGPIP peer-group DUB1-WGW
 neighbor CoreBGPIP peer-group DUB1-WGW
!
ip prefix-list XXXX seq 5 permit PublicIP/32
ip prefix-list XXXX seq 10 permit PrivateIP/32
!
route-map DNS_TO_GI permit 10
 match ip address prefix-list XXXXX
!
line vty
!

/etc/quagga/zebra.conf

!
! Zebra configuration saved from vty
!   2019/10/11 10:11:45
!
hostname hostname
!
interface ens160
 ipv6 nd suppress-ra
!
interface ens192
 ipv6 nd suppress-ra
!
interface ens192.890
 ipv6 nd suppress-ra
!
interface ens192.892
 ipv6 nd suppress-ra
!
interface XX
 ipv6 nd suppress-ra
!
interface lo
!
ip prefix-list XX seq 5 permit PublicIP3/32
ip prefix-list XX seq 10 permit PrivateIP/32
!
route-map XXXX permit 10
 match ip address prefix-list XXX
!
!
!
line vty
!

# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, A - Babel,
       > - selected route, * - FIB route

B>* 0.0.0.0/0 [20/10] via neighbor IP, ens192.892, 00:02:18
C>* 127.0.0.0/8 is directly connected, lo
C>* Public IP/32 is directly connected, lo
C>* NeighborSubnet/30 is directly connected, ens192.890
C>* NeighborIP/30 is directly connected, ens192.892
C>* LocalIP/32 is directly connected, lo

I am testing resolution using a test APN and while I can get resolution one one APN as sson as I introduce a second APN i just encounter the following errors below from a tcpdump:

11:29:38.065284 IP PublicIP.domain > internal IP.p2pcommunity: 30622 ServFail 0/0/0 (44)
11:29:38.265736 IP PublicIP.domain > internal IP.32209: 12606 ServFail 0/0/0 (37)
11:29:38.266037 IP PublicIP.domain > internal IP.10793: 26678 ServFail 0/0/0 (37)
11:29:38.295727 IP PublicIP.domain > internal IP.ibm_wrless_lan: 23483 ServFail 0/0/0 (33)
11:29:38.296038 IP PublicIP.domain > internal IP.22097: 8347 ServFail 0/0/0 (33)
11:29:38.297532 IP PublicIP.domain > internal IP.31026: 23400 ServFail 0/0/0 (38)
11:29:38.298117 IP PublicIP.domain > internal IP.23707: 26481 ServFail 0/0/0 (38)

and from /var/log/queries

22-Sep-2020 11:31:07.552 client: debug 3: client InternalIP#61793 (www.facebook.com): error
22-Sep-2020 11:31:07.552 client: debug 3: client InternalIP#61793 (www.facebook.com): send
22-Sep-2020 11:31:07.552 client: debug 3: client InternalIP#61793 (www.facebook.com): sendto
22-Sep-2020 11:31:07.552 client: debug 3: client InternalIP#48008 (2.android.pool.ntp.org): error
22-Sep-2020 11:31:07.552 client: debug 3: client InternalIP#61793 (www.facebook.com): senddone
22-Sep-2020 11:31:07.552 client: debug 3: client InternalIP#61793 (www.facebook.com): next
22-Sep-2020 11:31:07.552 client: debug 3: client InternalIP#61793 (www.facebook.com): endrequest
22-Sep-2020 11:31:07.553 client: debug 3: client InternalIP#48008 (2.android.pool.ntp.org): send
22-Sep-2020 11:31:07.553 client: debug 3: client InternalIP#48008 (2.android.pool.ntp.org): sendto
22-Sep-2020 11:31:07.553 client: debug 3: client InternalIP#48008 (2.android.pool.ntp.org): senddone
22-Sep-2020 11:31:07.553 client: debug 3: client InternalIP#48008 (2.android.pool.ntp.org): next
22-Sep-2020 11:31:07.553 client: debug 3: client InternalIP#48008 (2.android.pool.ntp.org): endrequest

I am really unsure of how to resolve this issue, any pointers ort advice would be greatly appreciated

Outputs of dig command

dig facebook.com

; <<>> DiG 9.9.4-RedHat-9.9.4-74.el7_6.1 <<>> facebook.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7204
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;facebook.com.          IN  A

;; ANSWER SECTION:
facebook.com.       93  IN  A   31.13.86.36

;; Query time: 2 msec
;; SERVER: internal DNS#53(Internal DNS)
;; WHEN: Tue Sep 22 19:38:58 UTC 2020
;; MSG SIZE  rcvd: 57


dig @PublicIP facebook.com

; <<>> DiG 9.9.4-RedHat-9.9.4-74.el7_6.1 <<>> @PublicIP facebook.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

dig @208.67.222.222 facebook.com

; <<>> DiG 9.9.4-RedHat-9.9.4-74.el7_6.1 <<>> @208.67.222.222 facebook.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

ip netns exec gi tcpdump -n -f 'port 53' -i any
09:55:35.676645 IP PublicIP.domain > InternalIP.46571: 36451 ServFail 0/0/0 (32)
09:55:35.676939 IP PublicIP.domain > InternalIP.37817: 52592 ServFail 0/0/0 (32)
09:55:35.677865 IP PublicIP.domain > InternalIP41737: 52624 ServFail 0/0/0 (32)
09:55:35.713870 IP PublicIP.34042 > 193.0.14.129.domain: 11264 [1au] A? mtalk.google.com. (45)
09:55:35.713914 IP PublicIP.11218 > 193.0.14.129.domain: 3623 [1au] NS? . (28)
09:55:35.768649 IP 193.0.14.129.domain > PublicIP.11218: 3623*-| 0/0/1 (28)
09:55:35.784456 IP 193.0.14.129.domain > PublicIP.34042: 11264-| 0/0/1 (45)
09:55:36.045130 IP PublicIP.wcbackup > 192.112.36.4.domain: 28368 A? update.googleapis.com. (39)
09:55:36.063323 IP InternalIP.49382 > PublicIP.domain: 57145+ A? accounts.google.com. (37)
09:55:36.064459 IP PublicIP.48169 > 193.0.14.129.domain: 15825 [1au] A? accounts.google.com. (48)
09:55:36.065883 IP APNIP.54312 > PublicIP.domain: 53585+ A? accounts.google.com. (37)
09:55:36.080202 IP 192.112.36.4.domain > PublicIP.wcbackup: 28368- 0/13/14 (499)
09:55:36.120905 IP 193.0.14.129.domain > PublicIP.48169: 15825- 0/15/27 (1182)
09:55:36.170289 IP InternalIP.59759 > PublicIP.domain: 52061+ A? www.google.com. (32)
09:55:36.224316 IP PublicIP.5346 > 192.112.36.4.domain: 40438 A? www.facebook.com. (34)
09:55:36.257993 IP 192.112.36.4.domain > PublicIP.5346: 40438- 0/13/14 (494)
09:55:36.441576 IP PublicIP.domain > InternalIP.65408: 45517 ServFail 0/0/0 (39)
09:55:36.441666 IP PublicIP.domain > InternalIP.60664: 54663 ServFail 0/0/0 (39)
09:55:36.442994 IP PublicIP.domain > InternalIP.48634: 56799 ServFail 0/0/0 (39)
09:55:36.443474 IP PublicIP.domain > InternalIP.36045: 34980 ServFail 0/0/0 (39)
Dunner1991
  • 31
  • 5

1 Answers1

0
  1. It would help us help you if you explain your network architecture (for e.g APN1, APN2 and what it is you want to achieve).
  2. It seems like something is going on with quagga configuration. You may want to post that as well. Quagga is typically used with bind to route traffic to nearest name resolver. for e.g. opendns (I have no affiliations with them and using it as an example only) publishes two IP addresses 208.67.222.222 and 208.67.220.220. But the queries originating from different part of the world goes to the server nearest to them (for e.g. queries originating in Europe are resolved by opendns server in Europe). All this is orchestrated using quagga.
tinkertwain
  • 305
  • 1
  • 8
  • Hi @tinkertwain I will answer both questions as best I can. 1. This DNS server is part of 2 server cluster each server having its own Public IP. Our customers are divided in different APNs based on their service. The APNs I have set up are test APNS that once configured on an end user device should get resolution through the primary and secondary DNS servers. In terms of Quagga configuration are you referring to the 'router config' or the conf file in linux? – Dunner1991 Sep 22 '20 at 14:27
  • By APN you mean ASN right? In order for things to work, you need to make sure that quagga has correct routing configuration for your test ASNs. You can use Quagga vtysh to display current routing configuration. Something like: ```show ip route``` – tinkertwain Sep 22 '20 at 18:33
  • So our customer connects to an APN, when they connect they are given an IP from the range assigned to the APN. When they attempt to broswe to the internet, the query comes in from the customer to our packet core, from their it is directed via vlan to the DNS server which has a Public IP assigned, this is the IP that is supposed to query the internet. however while i get successful queries for periods of time if i change the APN i.e. the ip range assignd to the user, i continously receive ServFail response. Additionally I cannot get dig to work using ip netns exec command – Dunner1991 Sep 22 '20 at 18:49
  • Can you post the output of a simple dig commands while on DNS server please: ```dig @198.6.1.2 facebook.com``` (using a resolver from UUnet) and ```dig @127.0.0.1 facebook.com``` (using your DNS server as resolver). – tinkertwain Sep 22 '20 at 19:12
  • I added the dig outputs in the summary above when i try to use dig @DNSPublicIP i get a connection timed out. However I can see this IP resolving queries via tcpdump for certain periods which is what i find quite strange. obvioulsy I am missing something just not sure what – Dunner1991 Sep 22 '20 at 19:47
  • How is your /etc/resolve.conf set to? When you issued ```dig facebook.com``` it resolved. And the rest did not. Also, did you get chance to look into quagga routing table? – tinkertwain Sep 22 '20 at 19:52
  • so /etc/resolv.conf is search domain.name nameserver internal IP address ( i have not added the public DNS IP here) – Dunner1991 Sep 22 '20 at 20:06
  • show ip route in quagga router s# show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, A - Babel, > - selected route, * - FIB route B>* 0.0.0.0/0 [20/10] via neighbor IP, ens192.892, 00:02:18 C>* 127.0.0.0/8 is directly connected, lo C>* Public IP/32 is directly connected, lo C>* NeighborSubnet/30 is directly connected, ens192.890 C>* NeighborIP/30 is directly connected, ens192.892 C>* LocalIP/32 is directly connected, lo – Dunner1991 Sep 22 '20 at 20:06
  • What is the difference between IP address range that works vs IP address range that does not. Can you validate that both follow the same route on quagga as per the routing table? – tinkertwain Sep 22 '20 at 20:51
  • Hi, the only difference is the range i.e. 10.0.1.0/24 and 10.0.2.0/24. The path that traffic on each APN takes is the same from our packet core to the internet, traffic is carried via vlan X to the DNS server, the quagga router learnes the 0.0.0.0 from the network core. The thing is I can resolve successfully on each APN for periods of time, but then for no reason I can ascertain it stops resolving and a service restart or system reboot is required. – Dunner1991 Sep 22 '20 at 21:11
  • So, let me understand this clearly. Based on your original description you said that it works from one APN and not from another. Now you say that it works from both in the beginning and after some time it stops working and requires a service restart. If later is the case, can you conduct a controlled test, where you have only one IP on APN and keep testing and observe what is happening to the routes on your DNS server. – tinkertwain Sep 22 '20 at 23:26
  • So testing this morning with one IP. Using tcpdump I see queries come into the DNS and querying the Public IP, however the return queries from the PublicIP return this type of error ***49715 ServFail 0/0/0 (34)***. If I check my Firewall I see the query leave but it seems toonly get a response from the root servers and goes no further. The inconsistency is troubling and the fact that when using namespaces dig does not resolve. I will continue to run tests – Dunner1991 Sep 23 '20 at 08:25
  • Can you perform a tcpdump capture from the dns server for traffic trying to resolve the name please. This will tell you what is going on with the name resolution. – tinkertwain Sep 23 '20 at 13:14
  • Apologies for the delay, I have a tcpdump showing the issue, I have posted an excerpt from the tcpdump in the summary above – Dunner1991 Sep 24 '20 at 09:56
  • Also I have resolved the dig issue I was seeing, and now I get an output using dig @DNSIP but i get status REFUSED on all requests – Dunner1991 Sep 24 '20 at 10:04