1

We run a popular web app, and our backend file storage exists on a file server, replicated to another file server for failover with DFSR. We are reaching the theoretical limits of DFSR, and therefore need to start looking at sharding our storage.

What is the best way to go about sharding? I know we could abstract out our file storage at the application-level, but, among other things, my mind boggles at how third party controls that interact with the filesystem would be able to hook into that abstraction. What are the best techniques you've seen, or can think of? Assume for now that we have a directory structure like /Customers/bikesystems, /Customers/10degrees, etc. One big directory of customer data, with each customer having its own folder in that /Customers directory.

My initial thinking has me breaking up that larger customer directory into a more hierarchical structure, as in /Customers/b/bikesystems, /Customers/1/10degrees (taking the first letter or number of each customer ID), which gives me the ability to create DFS namespaces for each first character that makes up a customer ID (which is [a-z0-9] for you regex folks). So, a potential 36 DFS namespaces. Then from there I can shuffle those namespaces around to various servers as capacity increases for any one of them. And this would give me a lot more breathing room before I reach those theoretical DFSR limits.

Is that the best approach?

I know we could be looking at Linux or other enterprise-level storage systems (Isilon, etc.). For this discussion, however, I'd like to keep the discussion limited Windows for now. Unless, of course, you have a burning desire to extol the benefits of a different solution altogether, and you would like to help me see the light!

Ken Randall
  • 143
  • 1
  • 6

1 Answers1

0

That's a heck of a site to depend on two servers!

What are the supported limits of DFS Replication?

The following list provides a set of scalability guidelines that have been tested by Microsoft on Windows Server 2008 R2 and Windows Server 2008:

* Size of all replicated files on a server: 10 terabytes.
* Number of replicated files on a volume: 8 million.
* Maximum file size: 64 gigabytes.
TessellatingHeckler
  • 5,726
  • 3
  • 26
  • 44
  • Yep, those are the limits I'm talking about alright! – Ken Randall Aug 18 '11 at 01:55
  • Then ... good luck! (Might want to edit your Q to answer things like: Do you have enough storage space to handle more data on these servers, or are you looking to add more servers, or to add new but distinct storage on these servers? What kind of data, and is it easy to partition in any ways (e.g. older data)? What kind of performance is needed (e.g. how quickly must it synchronise?) – TessellatingHeckler Aug 18 '11 at 02:29