How to configure GlusterFS for a low-latency read, async write setup with a volatile number of nodes?

Question

I need a storage system with the following requirements:

Dynamic pool of Linux EC2 application servers (1-20) that autoscale daily
Low latency reads
Possibly async writes (<5 minutes)
Most posix features, except locking
Each application server can work autonomously so needs all the data locally (replication mode)

Because I want to replicate many machines (N-Master setup), I think that I need async writes, which would be acceptable as the application can stand 5 minute delay in write propagation. But am not sure how to do this in GlusterFS and whether it is viable at all. How would you set it up?

Given this setup, let me sneak two more questions in:

How would GlusterFS manage conflicting async writes? If I'm not too bothered about data loss, what is the best way to resolve these conflicts?
Also, most GlusterFS documentation writes about manually adding/removing bricks. Does anyone succesfully run a setup where the addition/removal of bricks is done automatically, many times a day?
Is there perhaps a better alternative to GlusterFS for these particular requirements?

Background: I host 2000 Magento shops, using NFS at the moment. It sucks (SPOF, reliability), so I'm looking for an alternative. Magento software can run on read-only storage, however 98% of these shops use external modules that somehow depend on a shared writable filesystem. Now I could tell the shop developers that they should ditch these modules, but I'm afraid I wouldn't keep many customers ;)

Thanks!

Tnx! Unfortunately, I have little control over the application itself, so the posix requirement (regular FS interactions) is a hard one. — Willem, Jul 18 '12 at 10:40
That's a pity, cassandra is built just for the requirements you need. just from a support point of view. Get control of the application, get the source. It will be very hard to debug problems if you don't have the developer on board (or can fix it yourself) — The Unix Janitor, Jul 18 '12 at 11:32
also, what are you exactly trying to do? whats the application, what does it do..do i have to sign more nda's — The Unix Janitor, Jul 18 '12 at 11:36
What problems are you having with NFS? maybe tune , optimise and cluster NFS. It should be to difficult to get lots or read only NFS replica's, with a few servers and rsync. Else, forget NFS, and rsync to all your servers. Do you have a diagram of your farm? — The Unix Janitor, Jul 18 '12 at 15:37

score 2 · Answer 1 · answered Jul 18 '12 at 17:30

You're obviously using NFS at the moment, and it ticks all your boxes (async/POSIX feature-set/low latency) - so why don't you simply combine NFS with DRBD + Heartbeat to achieve a HA NFS solution?

Amazon cloud != low latency - and Gluster can be finicky at the best of times. We gave it a run, but its reliability just wasn't there - and wasn't any more appropriate than NFS.

Short of stale handles, we've had no issues with NFS on clusters - but I guess it depends on what throughput you are pushing, what switches you are subject to on Amazon's infrastructure etc.

Ps. Hi Willem (I know what company you represent and I'm surprised your posting on here looking for support!).

How to configure GlusterFS for a low-latency read, async write setup with a volatile number of nodes?

1 Answers1