4

We have a very large Windows file store (several terabytes and tens of millions of files) that I want to keep in continuous replication over to another server, in as near to real-time as possible. I'm looking for options on tools that will make this happen.

So far I've come up with:

  • Move it to a NAS or SAN and share the files. Won't work, not an option for us for at least six months.
  • Use robocopy with /MON. I'm worried about the drain on replication source resources as it rescans the entire tree every cycle. (While /MON watches for file changes before triggering a new cycle, each run performs a full scan, and does not actually use the file change data.)
  • Rsync. No better than robocopy.
  • DFS. I've heard very bad things from people at MS about DFS for stores with millions of small files. Probably half our files are very small.
  • Hybrid. Write my own tool or find one online that uses file watchers to target what files need to be copied. Though the buffer can overflow, so still run a nightly robocopy to pick up anything that was missed.
  • Backup-based. Some kind of crazy scripted backup/restore thing where the backup software can do very fast incremental snapshots.

Any ideas would be greatly appreciated, thanks!

scobi
  • 879
  • 3
  • 13
  • 17
  • What are your reasons for doing this replication? There may be a better way to accomplish what your ultimate goal is. – phoebus Nov 10 '10 at 17:32
  • You mean DFS replication like the OOOOOLD one or DFS-R? – TomTom Nov 10 '10 at 19:11
  • I mean DFS replication like the new one - we're on Server 2008R2. When insiders in multiple different teams at MS say "don't use it" it makes me nervous. And the reason for doing this replication is to keep a spare Perforce server running. They have a replication system for their database, but not the file revisions. We need the spare for failover as well as read-only "offline" access for expensive queries, backup, and validation that we do not want the main server to get hit with. Ideally we'd solve this with a SAN but that is just not an option now. – scobi Nov 10 '10 at 22:32
  • wow how'd you get a layout like that and not have a san involved. How about something like doubletake or steeleye? – tony roth Nov 11 '10 at 02:34

2 Answers2

2

Whatever you use, be sure that it uses the NTFS change journal so that it's not effectively "polling" the filesytem. A "robocopy /MON", for example, doesn't use the NTFS change journal so it ends up just "polling" the filesystem.

I have a Customer using SureSync to replicate a few million large and small files (around 1.5TB) to a "hot standby" file server computer. It works well for them. It uses the NTFS change journal to keep abreast of changes to the volume. It supports delta compression and use of a dedicated network interface for inter-server communication. (I have no affiliation with the company; I just like the tool.)

Evan Anderson
  • 141,881
  • 20
  • 196
  • 331
  • Robocopy does poll the file system. The /MON flag only says "wait for another change" but after that it re-runs the full poll. It does not actually use the change info for anything other than to trigger another run. And files are changing a lot more often than once a minute on the server, so /MON doesn't really do anything for us. I will look at SureSync though, thanks for the link. – scobi Nov 10 '10 at 22:34
  • 1
    @Scott Bilas: You're saying what I was trying to say but I didn't write very clearly. I've dropped on an edit to make what I was trying to say more clear. – Evan Anderson Nov 11 '10 at 00:21
1

This is the first I've heard anyone saying negative things about DFS-R. Our experience has only been positive.

In any case, I would start by trying DFS-R since it requires no additional hardware or software other than the Windows licenses you already have (assuming you've got Enterprise editions). It's also pretty easy to setup.

The largest DFS-R volume we manage is about 200GB and a little over 1 million files. Obviously this is a lot smaller than what you've got, but it's still fairly sizable. The contents are primarily software installation packages (some of which contain thousands of tiny files). We used to replicate this store with NTFRS and had nothing but problems. We upgraded to DFS-R back when 2003 R2 came out and it was a night and day difference. The servers have since been upgraded to 2008 and are still humming along without a glitch.

You'll definitely want to setup your staging volume on a different set of spindles for performance and you'll have to configure it pretty large as well. I'm not really an expert on the specifics though. It likely depends on how large your largest file is and how much churn there is on a regular basis. Microsoft PSS folks would likely provide better advice on this.

So get it setup and see if it performs adequately enough. If it doesn't, all you've lost is some time.

Ryan Bolger
  • 16,755
  • 4
  • 42
  • 64
  • 3
    Per Microsoft (see http://technet.microsoft.com/en-us/library/cc773238(WS.10).aspx#BKMK_00) DFS-R is limited to 8 million files per volume. – Evan Anderson Nov 10 '10 at 19:15
  • Well heck. Perhaps the files are grouped logically enough to be split into multiple volumes? – Ryan Bolger Nov 10 '10 at 19:21
  • @EvanAnderson - For those that don't follow your link, at some point the the past 4 years Microsoft updated the file number limit to 70 million: http://technet.microsoft.com/en-us/library/cc773238(WS.10).aspx#BKMK_00 – kevinmicke Nov 06 '14 at 19:33