0

Summary

I have two machines in an NLB cluster. If I shut-down one machine (to simulate failure) then the second doesn't take-up the load. I'm looking for help in diagnosing the reason for this.

Details

I have built a test/staging system consisting of two network-load-balanced hosts. The hosts are actually VMs running under VMware Server. Each host is running Windows 2003 Server Enterprise with SP2 applied, and each has two NICs. They are newly-built and have minimal config changes apart from installing IIS6.

IP addresses are as follows:

  • Host 1: Dedicated: 192.168.0.140 Cluster: 192.168.0.141

  • Host 2: Dedicated: 192.168.0.142 Cluster: 192.168.0.143

  • Cluster IP address: 192.168.0.144

  • Subnet mask: 255.255.255.0

On each host I have set the connection binding order so that the dedicated connection is first.

The cluster is configured to use unicast because I need communication between hosts using the dedicated NICs and I don't have a suitable router for multicast. Host 1 is priority 1, host 2 is priority 2. Weights are set to "Equal".

There is a single port rule:

  • All cluster IP addresses
  • Port range 80 to 80
  • All protocols
  • Multiple host filtering with no affinity

There were no problems creating the cluster and it converges ok. I can ping the cluster address, and http requests to that address return the expected result. I do this from a separate machine, always using the ip address.

Problem: When I shut-down host 1 (to simulate host failure) then I would expect host 2 to respond to pings and http requests on the cluster address, but that isn't happening. It looks like host 2 isn't doing anything.

Question: Can anyone suggest how I can troubleshoot this? What am I missing?

I have checked the following:

  • IP addresses and subnet masks are set as above. Dedicated connections have gateway and dns addresses specified, cluster connections don't.
  • MAC addresses for the cluster NIC are the same on both machines.
  • The Cluster connection is bound to the appropriate local IP address and the cluster IP address.

(I'm a developer, not an IT person, so apologies if my terminology is wrong or inexact)

andyjohnson
  • 131
  • 2
  • 11
  • What happens if you shut down host 2 instead of host 1? – Hyppy May 27 '11 at 14:30
  • @Hyppy If I shut-down Host 2 then pings and http requests to the cluster address continue to work. – andyjohnson May 27 '11 at 14:37
  • Do you mean Host 2? – Hyppy May 27 '11 at 14:39
  • What are your port rules for the cluster? – joeqwerty May 27 '11 at 14:41
  • @Hyppy Yes, sorry, I meant host 2. I have edited my comment. – andyjohnson May 27 '11 at 14:41
  • @joeqwerty Port rules are: All cluster IP addresses; port range 80 to 80; all protocols, multiple host filtering with no affinity. (I'll all this to the question) – andyjohnson May 27 '11 at 14:45
  • How about the host parameters and load weight? – joeqwerty May 27 '11 at 15:01
  • @joeqwerty Host 1: Interface is "Cluster", priority is 1, IP address is 192.168.0.141, subnet mask 255.255.255.0, default state is "started", load weight is "Equal". Host 2: Interface is "Cluster", priority is 2, IP address is 192.168.0.143, subnet mask 255.255.255.0, default state is "started", load weight is "Equal". (I named the connections "Cluster" and "Dedicated") – andyjohnson May 27 '11 at 15:08

2 Answers2

2

The cause of the problem turned out to be that I was creating the NLB cluster using unicast mode, which has compatibility problems with VMware's virtualised network plumbing. When I re-created the cluster using multicast then it worked correctly.

Microsoft's documentation suggests that using unicast is the simplest option because it requires no router configuration changes. This is not true under VMware, which will require some network configuration changes. Multicast mode seems to just work.

Useful links:

andyjohnson
  • 131
  • 2
  • 11
0

When you have Server 1 up still do you see any cluster related traffic on Server 2's Cluster NIC?

I suspect if the failover is not working then you may have an issue with the clustering traffic.