We have a couple dozen Win2008 and 2008R2 Enterprise clusters that are used for SQL Server 2008 and 2008 R2 Enterprise/Datacenter. In the past we have had many issues with random failovers and "Network is Partitioned" errors on several servers on the other side of the globe. This was mostly resolved in updating NIC drivers and uninstalling Forefront Endpoint Protection (not sure how that played into everything but it helped).
Fast forward six months to November and we are getting constant alerts from SCOM and in the Event Log that the clusters (two in particular) are failing with "Network is Partitioned" errors several times a week, but no failure actually occurred. SQL Server is still up and running, no interruption in service is noticed on the web front ends. The errors seem to originate from the 'Passive' node and replicate through the network (we receive the first alerts from Passive, then active, then web front end) but all nodes/network adapters/disks/applications/ip's/websites remain functional. We cannot find the reason for these errors continually popping up when nothing appears to be wrong with the cluster, network or anything at all. Any ideas about the cause or possible direction we could go to investigate would be great.