0

We have two newly built RHEL 5.6 x86_64 servers which are part of an Oracle Database Cluster. One is named CMORAC1 and the other CMORAC2. The primary IP Addresses of both servers are 10.100.9.144 and 10.100.9.154 respectively. The network interfaces of each server are bonded for improved performance and kernel-level load balancing.

On CMORAC1, the hostid command always returns the same thing: 640a9009. When converted back into an IP Address, it gives 10.100.9.144, which is the right IP Address.

On CMORAC2 however, the hostid command gives very inconsistent results. I have run the command in a loop 50 times (with a sleep of 1 second between each loop), and it gave different hostids. The unique values returned are:

640a4a10
640a9909
640a9a09
640a9b09
640a9c09
fea9b8fc

The IP Addresses these hostids correspond to are:

10.100.16.74
10.100.9.153
10.100.9.154
10.100.9.155
10.100.9.156
169.254.252.184

The hostid should always be 640a9a09, but it's not.

Here's the ifconfig of CMORAC2:

bond0     Link encap:Ethernet  HWaddr 78:2B:CB:1A:97:3F  
          inet addr:10.100.9.154  Bcast:10.100.9.255  Mask:255.255.255.128
          inet6 addr: fe80::7a2b:cbff:fe1a:973f/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:2167149212 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2169807434 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:279381053647 (260.1 GiB)  TX bytes:366406519908 (341.2 GiB)

bond0:2   Link encap:Ethernet  HWaddr 78:2B:CB:1A:97:3F  
          inet addr:10.100.9.153  Bcast:10.100.9.255  Mask:255.255.255.128
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

bond0:3   Link encap:Ethernet  HWaddr 78:2B:CB:1A:97:3F  
          inet addr:10.100.9.155  Bcast:10.100.9.255  Mask:255.255.255.128
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

bond0:4   Link encap:Ethernet  HWaddr 78:2B:CB:1A:97:3F  
          inet addr:10.100.9.156  Bcast:10.100.9.255  Mask:255.255.255.128
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

bond1     Link encap:Ethernet  HWaddr 78:2B:CB:1A:97:43  
          inet addr:10.100.16.74  Bcast:10.100.16.79  Mask:255.255.255.248
          inet6 addr: fe80::7a2b:cbff:fe1a:9743/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:9000  Metric:1
          RX packets:517202985 errors:0 dropped:0 overruns:0 frame:0
          TX packets:571091767 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:330164712285 (307.4 GiB)  TX bytes:481545253520 (448.4 GiB)

bond1:1   Link encap:Ethernet  HWaddr 78:2B:CB:1A:97:43  
          inet addr:169.254.252.184  Bcast:169.254.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:9000  Metric:1

eth0      Link encap:Ethernet  HWaddr 78:2B:CB:1A:97:3F  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:1374977659 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1556885797 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:172018406954 (160.2 GiB)  TX bytes:257910742704 (240.1 GiB)
          Interrupt:138 Memory:d6000000-d6012800 

eth1      Link encap:Ethernet  HWaddr 78:2B:CB:1A:97:41  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:792171553 errors:0 dropped:0 overruns:0 frame:0
          TX packets:612921637 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:107362646693 (99.9 GiB)  TX bytes:108495777204 (101.0 GiB)
          Interrupt:146 Memory:d8000000-d8012800 

eth2      Link encap:Ethernet  HWaddr 78:2B:CB:1A:97:43  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
          RX packets:13570946 errors:0 dropped:0 overruns:0 frame:0
          TX packets:382329420 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:8059679153 (7.5 GiB)  TX bytes:310241198851 (288.9 GiB)
          Interrupt:154 Memory:da000000-da012800 

eth3      Link encap:Ethernet  HWaddr 78:2B:CB:1A:97:45  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
          RX packets:503632039 errors:0 dropped:0 overruns:0 frame:0
          TX packets:188762347 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:322105033132 (299.9 GiB)  TX bytes:171304054669 (159.5 GiB)
          Interrupt:162 Memory:dc000000-dc012800 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:187853342 errors:0 dropped:0 overruns:0 frame:0
          TX packets:187853342 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:158085331402 (147.2 GiB)  TX bytes:158085331402 (147.2 GiB)

The file /etc/hostid is not present on the sever, and the NetworkManager service is stopped.

This issue is causing us a lot of problems with the licensing of a software which uses the hostid to generate the license. Because it's not consistent, we can't keep the software licensed. That issue is not happening on its twin CMORAC1 which according to the admin who built it, should be configured the same...

Would any one have an idea on what could be causing this?

Yanick Girouard
  • 2,385
  • 1
  • 18
  • 19

2 Answers2

1

We found the issue which was causing this behaviour. It was DNS related.

There were multiple DNS entries IPs for the same name:

[root@cmorac2 ~]# nslookup cmorac2
Server:         10.100.9.174
Address:        10.100.9.174#53

Name:   cmorac2.cibc.cginet
Address: 10.100.9.156
Name:   cmorac2.cibc.cginet
Address: 10.100.16.74
Name:   cmorac2.cibc.cginet
Address: 169.254.252.184
Name:   cmorac2.cibc.cginet
Address: 10.100.9.153
Name:   cmorac2.cibc.cginet
Address: 10.100.9.154
Name:   cmorac2.cibc.cginet
Address: 10.100.9.155

After this was corrected, the hostid returned was consistent and always the same.

This is good to know!

Yanick Girouard
  • 2,385
  • 1
  • 18
  • 19
0

I'm sure the ethernet bonding is causing this problem. I'm wondering if perhaps one solution might be to simply dedicate eth0, give it an IP number, and not bond it, maybe not even use it. This would probably elicit a consistent result. You could probably install another ethernet board if you need the ports, I'm sure they are pretty cheap these days, assumming you have slots in the motherboard.

I have confirmed on my machines that if a number is placed in the /etc/hostid, the hostid command will return a number consistently (in hex). This could be another possible approach to solving your problem.

mdpc
  • 11,856
  • 28
  • 53
  • 67
  • Thing is, why is it working fine on CMORAC1 which has the SAME bonding configuration? I'd be more interested to know the actual cause of this rather than trying to find a workaround. – Yanick Girouard Nov 01 '11 at 22:34