I have a Two Windows 2012 R2 running in as a Windows file cluster. The file cluster works prefect with almost instantaneous fail-over between nodes of the iSCSI Drives and SMB Shares.
I also have another Windows 2012 R2 file server with a pre-seeded copy of the data in remote location. I used the new DFSR database export feature. I used with guide as almost complete reference:http://blogs.technet.com/b/filecab/archive/2013/08/21/dfs-replication-initial-sync-in-windows-server-2012-r2-attack-of-the-clones.aspx
As soon as replication begins there are no problems. During initial replication the file server is resolving conflicts between files without issues. I do however experience a significant slow down on network performance for clients accessing the SMB-Shares. Its random and does not effect everyone at once. During the entire time there is minimal resource usage on both file servers.
Then somtimes randomly, the entire DFSR service enters a failed state on the Cluster. Causing a cascade effect that makes all SMB-Shares drop out and become unresponsive to anyone on the network. On top of that it prevents the SMB-Shares from failing over to the other clusters. The DFSR service gets stuck and cannot be stoppped via services.msc. Your forced to kill the actual process.
I get a situation where I have to manually stop the DFSR service, and most of the time it doesn't work. Leading me to kill DFSR process.
I have tried recreating the DFSR group,folder,connections multiple times. Included in that is completely removal of DFSR data from AD, and system volume information folder.
Even with a brand new setup from scratch the problem continues. Remember the data is pre-seeded, and i am DFSR clone database. I was wondering has anyone else experienced problems like this?
This is a cluster running on Vmware using best practices by MS and Vmware.
Its about 3TB of data, many small files, prereplicated.
I have been able to reach %100 mesh between data in servers, only for it to do problem described above when a large change occurs.
DFSR is being extremely unstable
Proper Replication banwidth limits are set, the data is transmirted between two servers though site-to-site VPN.
I am using DFS Namespaces(unrelated to dfsr folder), I am using domain path for DFS.