0

Am running a 2-node Oracle 12.1.0.2.0 ASM Flex Cluster using Oracle GNS which I believe uses zeroconf to create it's adhoc network.

Before GNS starts, DNS works, i.e. nslookup, dig, ping, ssh all work for both local network and www (e.g. google.com). Here is what the routing table looks like BEFORE GNS starts:

 [root@lxcora03 ~]# route
 Kernel IP routing table
 Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
 default         vmem1.vmem.org  0.0.0.0         UG    0      0        0 eth0
 10.207.39.0     *               255.255.255.0   U     0      0        0 eth0
 172.0.0.0       *               255.0.0.0       U     0      0        0 eth5
 172.0.0.0       *               255.0.0.0       U     0      0        0 eth6
 192.0.0.0       *               255.0.0.0       U     0      0        0 eth1
 192.0.0.0       *               255.0.0.0       U     0      0        0 eth2
 192.0.0.0       *               255.0.0.0       U     0      0        0 eth3
 192.0.0.0       *               255.0.0.0       U     0      0        0 eth4
 [root@lxcora03 ~]# 

After GNS starts, new routes are added, and ping and ssh BREAK, while nslookup and dig continue to work. Here is the routing table after GNS has started. I suspect the problem is related to that link-local entry in the routing table but not sure.

 [root@lxcora03 ~]# route
 Kernel IP routing table
 Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
 default         vmem1.vmem.org  0.0.0.0         UG    0      0        0 eth0
 10.207.39.0     *               255.255.255.0   U     0      0        0 eth0
 link-local      *               255.255.192.0   U     0      0        0 eth1
 169.254.64.0    *               255.255.192.0   U     0      0        0 eth2
 169.254.128.0   *               255.255.192.0   U     0      0        0 eth3
 169.254.192.0   *               255.255.192.0   U     0      0        0 eth4
 172.0.0.0       *               255.0.0.0       U     0      0        0 eth5
 172.0.0.0       *               255.0.0.0       U     0      0        0 eth6
 192.0.0.0       *               255.0.0.0       U     0      0        0 eth1
 192.0.0.0       *               255.0.0.0       U     0      0        0 eth2
 192.0.0.0       *               255.0.0.0       U     0      0        0 eth3
 192.0.0.0       *               255.0.0.0       U     0      0        0 eth4
 [root@lxcora03 ~]# 

I'm wondering how to get www resolution working again. Do I need to add a static route? I'm an Oracle DBA and I know enough about networking to build openvswitch, work with dhclient.conf, setup bind, config ifcfg-ethX files, work with resolv.conf, etc., but this problem has defied by ability to solve it at my current skill level. I've hacked away at it for a couple days, trying various approaches with nsswitch.conf, avahi-daemon, resolv.conf, etc. to no avail.

All suggestions welcome but I do need to use the GNS which is working great so deconfiguring GNS is not an option. Thank you.

route with "-n" switch (replying to Andrew - thanks!)

BEFORE ORACLE GNS STARTS:

[root@lxcora03 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.207.39.1     0.0.0.0         UG    0      0        0 eth0
10.207.39.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
172.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 eth5
172.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 eth6
192.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 eth1
192.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 eth2
192.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 eth3
192.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 eth4

AFTER ORACLE GNS STARTS (note multicast IP routes have appeared)

[root@lxcora03 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.207.39.1     0.0.0.0         UG    0      0        0 eth0
10.207.39.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.192.0   U     0      0        0 eth1
169.254.64.0    0.0.0.0         255.255.192.0   U     0      0        0 eth2
169.254.128.0   0.0.0.0         255.255.192.0   U     0      0        0 eth3
169.254.192.0   0.0.0.0         255.255.192.0   U     0      0        0 eth4
172.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 eth5
172.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 eth6
192.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 eth1
192.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 eth2
192.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 eth3
192.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 eth4
[root@lxcora03 ~]# 

resolv.conf (same before and after GNS starts)

[root@lxcora03 ~]# cat /etc/resolv.conf
options attempts:2 timeout:1 
; generated by /sbin/dhclient-script
search vmem.org gns1.vmem.org
nameserver 10.207.39.3
nameserver 10.207.39.1

UPDATE December 23 ... so taking a hint from Andrew's reply, I worked with the /etc/resolv.conf and believe I have at least a "workaround" that solves my requirement, although it does not explain the cause of the issue. So here it is.

For some reason, after Oracle RAC GNS cluster has fully started, the resolution of both local and www addresses stops working when using the above /etc/resolv.conf , so, somehow, some resolution path gets broken. At the risk of saying somthing dumb, the routes mentioned above seem to have absolutely nothing to do with this issue: I manually deleted the routes after Oracle stack was completely up and it had no effect - the DNS resolution was still broken. If that's utterly idiotic, please remember I'm an Oracle DBA first and an amateur networking admin a distant second skill (but improving all the time!)

So what I did find was that the following /etc/resolv.conf can resolve everything as needed: external addresses, local domain addresses, and RAC cluster addresses. Here is that /etc/resolv.conf that works:

[root@lxcora02 ~]# cat /etc/resolv.conf

options attempts:2 timeout:1
; generated by /sbin/dhclient-script
search vmem.org gns1.vmem.org
nameserver 10.207.39.1
nameserver 8.8.8.8
nameserver 10.207.39.3

[root@lxcora02 ~]# 

I generate this /etc/resolv.conf at bootup time (these are LXC Linux Containers BTW) by using this file:

[root@lxcora02 ~]# cd /etc/dhcp
[root@lxcora02 dhcp]# cat dhclient.conf
append domain-name-servers 8.8.8.8, 10.207.39.3;
append domain-name " gns1.vmem.org";
[root@lxcora02 dhcp]# 

Just for the record, note that the final nameserver, "10.207.39.3" is the GNS "delegated" domain (see Oracle Grid Naming Service documentation for more information). The nameserver "10.207.39.1" is my local nameserver for my local-only "vmem.org" domain. Finally of course nameserver "8.8.8.8" gives me my www resolutions such as google.com etc.

One other note: ORDER of nameservers MATTERS in this case. I tried different orders of nameservers and only having them in the shown order gave all the required resolutions. Putting say "8.8.8.8" first, for example, resolves google.com, but breaks local domain resolutions, so this particular order by trial-and-error turned out to be the order of namerservers that provided the all-successful resolutions.

Now I have all the DNS resolutions working correctly, but as mentioned, this doesn't really explain why the startup of Oracle RAC cluster stack somehow breaks resolution using the "other" resolv.conf file.

Thanks Andrew for the response, it got me chipping away at this again, and I'll accept a "workaround consolation prize" that actually works any day and twice on Sunday.

gstanden
  • 1
  • 1

0 Answers0