1

I'm a new to MSCS and I'm trying to understand something that recently happened.

We have a 2 node 2008 R2 cluster using a witness disk (node and disk majority) over iSCSI SAN.

I've read over some documentation on MSCS clusters and how they deal with shared storage, specifically:

http://support.microsoft.com/kb/309186

http://technet.microsoft.com/en-us/library/cc770620%28v=ws.10%29.aspx

We shut down both nodes for maintenance (first the passive node, then the active node). Once the maintenance was complete, we booted the previously active node (i.e. the last one to be shutdown was the first one booted).

When the first node came up, cluster service refused to start and all disks showed reserved. Event logs were complaining that the witness disk was reserved. Only when we booted the 2nd node did the cluster actually start as normal.

What's confusing to me is, this behavior seems like a pure node majority cluster. If you just boot one node, no quorum can be attained, so the cluster will listen for additional nodes but not actually start services. I understand this.

But with node+disk majority, the quorum should act as the tie-breaker. So it seems to me like when the first node boots, it should reserve quorum, and then start the cluster (being that it has 2 votes...itself and the disk). The fact that the cluster can run with just one node (i.e. if one of the two nodes fails) makes it even more confusing that we couldn't start the cluster.

So my question are is it expected that a 2 node cluster with node+disk majority will only start when both nodes are booted, and if yes, why? And, does this also happen with, say, a 4 node cluster?

tmg
  • 11
  • 1
  • 2

2 Answers2

3

a witness disk only provides a vote when a cluster host owns the resource. ownership of a resource cam only be given by the owner or voted with a quorum. when the cluster is shut down ownership of all resources are released.

therefore, the only way to cold start a cluster that uses a witness disk is to bring a majority of cluster members online. alternatively, an administrator can force start the cluster with a single node because that forces that node to take ownership of all resources.

longneck
  • 23,082
  • 4
  • 52
  • 86
  • That makes sense based on the behavior I observed, but it doesn't make sense to me from the MS KB I linked: _"When the cluster service on the **forming node** starts, it first tries to bring online the physical device designated as quorum disk. It executes the disk arbitration algorithm on the quorum disk to gain ownership."_ Am I misunderstanding the KB? Or is it just implied that the disk arbitration algorithm itself requires a majority? – tmg May 11 '13 at 14:32
  • you're right that the kb implies it should bring the disk online, but that is not what I have seen in practice. this might be a case of a bug in the hba drivers or the storage subsystem such that the quorum disk is not properly released when the cluster shuts down and therefore it can't bring it online manually. – longneck May 11 '13 at 19:18
0

No, a 2-node cluster with node and disk majority can start with just one node active. You need >50% votes to achieve quorum, so your quorum disk should achieve this for you.

I've seen situations where you can't launch cluster administrator using the cluster's name, especially when the first node up is the one that didn't previously own the quorum disk. Instead, you supply "." as the name of the cluster, i.e.: local machine.

I'd want to check my cluster groupings, and ensure your quorum disk is in the right group.

Next, I'd want to go back to basics, and check my multi-pathing. Haven't touched iSCSI SANs yet; still using good-ole FC/AL.

FInally, don't forget the CLUSTER.LOG file. You an change the verbosity using CLUSTER.EXE. It's actually quite a good log file.

If I think of anything else, I'll edit my post.

Simon Catlin
  • 5,232
  • 3
  • 17
  • 20