2

We are facing a design "challenge" where we are required to set up a storage solution with the following properties:

What we need

  1. HA
    • a scalable storage backend
    • offline/disconnected operation on the client to account for network outages
  2. cross-platform access
    • client-side access from certainly Windows (probably XP upwards), possibly Linux
    • backend integrates with AD/LDAP (permission management (user/group management, ...))
  3. should work reasonably well over slow WAN-links

Another problem is that we don't really know all possible use cases here, if people need to be able to have concurrent access to shared files or if they will only be accessing their own files, so a possible solution needs to account for concurrent access and how conflict management would look in this case from a user's point of view.

This two years old blog posts sums up the impression that I have been getting during the last couple of days of research, that there are lots of current übercool projects implementing (non-Windows) clustered petabyte-capable blob-storage solutions but that there is none that supports disconnected operation nicely and natively, but I am hoping that we have missed an obvious solution.

What we have tried

OpenAFS

We figured that we want a distributed network filesystem with a local cache and tested OpenAFS (which, as the only currently "stable" DFS supporting disconnected operation, seemed the way to go) for a week but there are several problems with it:

  • it's a real pain to set up
  • there are no official RHEL/CentOS packages
  • the package of the current stable version 1.6.5.1 from elrepo randomly kernel panics on fresh installs, this is an absolute no-go
  • Windows support (including the required Kerberos packages) is mystical. The current client for the 1.6 branch does not run on Windows 8, the current client for the 1.7 does but it just randomly crashes. After that experience we didn't even bother testing on XP and Windows 7. Suffice to say, we couldn't get it working and the whole setup has been so unstable and complicated to setup that it's just not an option for production.

Samba + Unison

Since OpenAFS was a complete disaster and no other DFS seems to support disconnected operation we went for a simpler idea that would sync files against a Samba server using Unison. This has the following advantages:

  1. Samba integrates with ADs; it's a pain but can be done.
  2. Samba solves the problem of remotely accessing the storage from Windows but introduces another SPOF and does not address the actual storage problem. We could probably stick any clustered FS underneath Samba, but that means we need a HA Samba setup on top of that to maintain HA which probably adds a lot of additional complexity. I vaguely remember trying to implement redundancy with Samba before and I could not silently failover between servers.
  3. Even when online, you are working with local files which will result in more conflicts than would be necessary if a local cache were only touched when disconnected
  4. It's not automatic. We cannot expect users to manually sync their files using the (functional, but not-so-pretty) GTK GUI on a regular basis. I attempted to semi-automate the process using the Windows task scheduler, but you cannot really do it in a satisfactory way.
  5. On top of that, the way Unison works makes syncing against Samba a costly operation, so I am afraid that it just doesn't scale very well or even at all.

Samba + "Offline Files"

After that we became a little desparate and gave Windows "offline files" a chance. We figured that having something that is inbuilt into the OS would reduce administrative efforts, helps blaming someone else when it's not working properly and should just work since people have been using this for years. Right? Wrong. We really wanted it to work, but it just doesn't. 30 minutes of copying files around and unplugging network cables/disabling network interfaces left us with

  • (silent! there is only a tiny notification in Windows explorer in the statusbar, which doesn't even open Sync Center if you click on it!) undeletable files on the server (!) and
  • conflicts that should not even be conflicts.

In the end, we had one successful sync of a tiny text file, everything else just exploded horribly.

Beyond that, there are other problems:

  • Microsoft admits that "offline files" in Windows XP cannot cope with "large files" and therefore does not cache/sync them at all which would mean those files become unavailable if the connection drops
  • In Windows 7 the feature is only available in the Professional/Ultimate/Enterprise editions.

Summary

Unless there is another fault-tolerant DFS that supports Windows natively I assume that stacking a HA Samba cluster on top of something like GlusterFS/Lustre/whatnot is the only option, but I hope that I am wrong here. How do other companies allow fault-tolerant network access to redundant storage in a heterogeneous environment with Windows?

  • You are facing in front of incomparable requirements: DFS + slow unreliable WAN link. Any DFS uses central or distributed file locking to maintain consistency. And you will always get stuck on your WAN links here. ... at least if you want your filesystem to be posix-compilant. – Veniamin Oct 24 '13 at 10:13
  • Why isn't anyone thinking about Ceph? – dlyk1988 Oct 24 '13 at 12:01
  • @dsljanus Nobody is *not* thinking about Ceph. Ceph is an object storage like the other ones mentioned, that alone does not solve the problem since there is no Windows filesystem that can access it directly. That's what I tried to express with point (2) under *Samba + Unison*. We could use any type of object storage if we decide to go that route, but that alone does not solve the actual problem. – Adrian Frühwirth Oct 24 '13 at 12:09
  • Maybe I should rephrase that...the problem is not so much to deploy a distributed, fault-tolerant filesystem/storage backend. The problem is to on-top have Windows access to the files and allow Windows clients to work with the files "as if nothing happened" if they get disconnected from the network and have the files resync upon reconnection with no/least possible user interaction. – Adrian Frühwirth Oct 24 '13 at 12:19
  • @AdrianFrühwirth Actually there is the cephwin client under active development. Also, I THINK there is the option to have Ceph act as an iSCSI block device. – dlyk1988 Oct 24 '13 at 12:36
  • @dsljanus I did not hear that CEPH is designed to be distributed across WAN links. It rather looks like LAN-scaled distributed solution. I am not finally sure, so correct me if I am wrong. – Veniamin Oct 24 '13 at 13:29
  • @dsljanus Forgive my ignorance, but the google code repo doesn't have any code checked in and their wiki just talks about future goals - this looks anything but "actively developed". Also, it says "...for the Windows 2008/2008 R2 Operating Systems" - that doesn't sound like it's going to be running on Windows XP/7 clients (plus, I need a stable solution that is rock-solid today, not in the possible future). – Adrian Frühwirth Oct 24 '13 at 13:33

1 Answers1

2

As I commented before, DFS is not the proper way for your requirements.

I think the following solution stack is best-suited for you:

  1. Distrubuted HA object storage like Openstack SWIFT (https://wiki.openstack.org/wiki/Swift).

  2. Dropbox-like application on top of the object storage (e.g. http://www.gladinet.com/openstack-access-solution.html).

Veniamin
  • 863
  • 6
  • 11