1

I have a Two Windows 2012 R2 running in as a Windows file cluster. The file cluster works prefect with almost instantaneous fail-over between nodes of the iSCSI Drives and SMB Shares.

I also have another Windows 2012 R2 file server with a pre-seeded copy of the data in remote location. I used the new DFSR database export feature. I used with guide as almost complete reference:http://blogs.technet.com/b/filecab/archive/2013/08/21/dfs-replication-initial-sync-in-windows-server-2012-r2-attack-of-the-clones.aspx

As soon as replication begins there are no problems. During initial replication the file server is resolving conflicts between files without issues. I do however experience a significant slow down on network performance for clients accessing the SMB-Shares. Its random and does not effect everyone at once. During the entire time there is minimal resource usage on both file servers.

Then somtimes randomly, the entire DFSR service enters a failed state on the Cluster. Causing a cascade effect that makes all SMB-Shares drop out and become unresponsive to anyone on the network. On top of that it prevents the SMB-Shares from failing over to the other clusters. The DFSR service gets stuck and cannot be stoppped via services.msc. Your forced to kill the actual process.

I get a situation where I have to manually stop the DFSR service, and most of the time it doesn't work. Leading me to kill DFSR process.

I have tried recreating the DFSR group,folder,connections multiple times. Included in that is completely removal of DFSR data from AD, and system volume information folder.

Even with a brand new setup from scratch the problem continues. Remember the data is pre-seeded, and i am DFSR clone database. I was wondering has anyone else experienced problems like this?

  • This is a cluster running on Vmware using best practices by MS and Vmware.

  • Its about 3TB of data, many small files, prereplicated.

  • I have been able to reach %100 mesh between data in servers, only for it to do problem described above when a large change occurs.

  • DFSR is being extremely unstable

  • Proper Replication banwidth limits are set, the data is transmirted between two servers though site-to-site VPN.

  • I am using DFS Namespaces(unrelated to dfsr folder), I am using domain path for DFS.

Sarge
  • 502
  • 1
  • 6
  • 17
  • Crash is the wrong term here. It maybe unresponsive, but it hasn't crashed. Its impossible to answer without cluster and dfsr related debug logs and likely some dumps of the processes relevant. You should raise a case with Microsoft if you can. – maweeras Apr 06 '14 at 08:18

0 Answers0