I'm attempting to move Cluster Core Resources from 1 node to another in a 4 node WSFC (these are all VMs running on Compute Engine in Google Cloud, Windows Server 2012 R2, each in a different subnet). I'm running
Move-ClusterGroup -Name "Cluster Group" -Node mynode
And getting the error:
Move-ClusterGroup : An error occurred while moving the clustered role 'Cluster Group'. The operation failed because either the specified cluster node is not the owner of the group, or the node is not a possible owner of the group
I have moved the Available Storage Cluster Group in this manner successfully, it's just this operation that's failing. The cluster hosts a SQL Server Availability Group which is online and working as expected, and has been failed over previously multiple times.
The first time I tried to do this I got an error in the Cluster Events saying:
Cluster resource 'Cluster IP Address [ip of current host]' of type 'IP Address' in clustered role 'Cluster Group' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
So I checked the ip resources for the cluster core resources and saw each had a possible owner of all 4 nodes, despite being in the wrong subnets. It looks like it was trying to bring the current ip up on the target host, which of course didn't work. I removed the 3 ips in the "wrong" subnet from each of the cluster resources and since then have been getting the first error message I've included here.
I ranGet-ClusterGroup -Name "Cluster Group" | Get-ClusterOwnerNode
which initially returned {} for OwnerNodes. I've since tried adding the current owner + the node that I'm trying to move to using Set-ClusterOwnerNode
and I can now see the two I'd expect as possible owners, but it's made no difference to the move.
I did wonder if this could be a DNS issue. I assume it's correct to just have the one entry in DNS for the Cluster with the current online IP, so that should be getting updated during a move (as opposed to having multiple A records all with different IPs). I tried updating the security on this, just giving the 2 nodes full control for a bit, as well as checking the permissions on the cluster object (which already had permissions). I haven't done any more with AD/DNS because I don't want to screw things up.
I've run the cluster validation and it doesn't give anything I would consider a reason for this. There are warnings against: the different IP cluster core resources because they can no longer be owned by each node, HostRecordTTL and RegisterAllIP settings, unsigned drivers, some differences in software on the 2 nodes (just updates that have been applied to the one I'm trying to move to).