Howto: Block or File replication across 3+ nodes without a SAN

Question

The setup

I admin the backend for a website that currently exists on a single node using Nginx (webserver), Neo4J (database) and Wildfly (app server). The website is getting enough traffic that we are both storage and memory resource limited on the current 'all-in-one' node, so I instantiated two more VPS nodes (3 in total) that will only run WildFly.

I've successfully configured Nginx to use the 'hash' load-balancing feature across the 3 nodes based on a user-ID contained within the website URI to ensure users are consistently routed to the same VPS node running Wildfly to optimize caching.

Each of the 3 nodes has their own 150GB high-availability block storage (maintained by the VPS provider) that contains a single /images directory mounted that the Wildfly app will be reading/writing image files from/to on its respective node.

Update

The image files should be write-once/read-many (at least for the nominal case) so new images get created all the time, but existing images rarely get updated. Additionally, because of Nginx's hash load-balancing, each Wildfly node should have all the images it needs for the clients that get routed to it. The need for replication is really two fold:

It makes adding or removing Wildfly nodes transparent as each node has all the data from the other nodes
It makes backing up easier as everything is consolidated in one place

Additionally, each of the VPS nodes are a part of a private gigabit VLAN that the VPS provider enables for all nodes in the same datacenter (of which all my nodes are.) It will be this link that the replication data will traverse.

The Problem

Because the app is now distributed, I want each of the /images directories across the 3 nodes to be fully replicated. Although Nginx's 'hash' load-balancing ensures consistent node usage on a per-user basis, I want the contents of the /images directory to be a union of all three nodes in case one of the nodes goes down and users need to be redistributed across the other available nodes.

The Question

What is the best way to address the problem above? From my understanding, rsync is not the appropriate tool for this job. There is this Server Fault Question which is similar in nature but it's 12 years old and i'm sure there have been some advances in data replication since that time.

In my research, I came across GlusterFS which looks promising, but it's unclear how to set this up to address my problem. Would I make each of the high-availability block storage devices on each node a single 'brick' and then combine that into a single Gluster volume? I presume I then create the /images directory on this single Gluster volume and mount this to each of the nodes via the native FUSE client? My gut says this is not correct because each of the nodes are both clients and server simultaneously as they are both contributing a 'brick' and read/writing to the Gluster volume which seems unconventional.

Do you have a beefy and low-latency connection in between the cluster nodes? — BaronSamedi1958, Jul 10 '23 at 08:32
The VPS provider allows each node that exists in the same datacenter (which all my nodes do) to have a Virtual Private Network. Running `iperf` across two nodes it appears this is a gigabit VLAN; not sure if this qualifies as "beefy." — SiegeX, Jul 12 '23 at 07:14
1Gb connectivity should be enough for virtual SAN solutions! — BaronSamedi1958, Jul 12 '23 at 18:08

symcbean · Accepted Answer · 2023-07-10T13:53:00.590

The SAN model supposes that you have a highly available block storage service - you could implement the same at file level - but this would mean adding more nodes (or putting additional workload on your existing hosts). And making NFS highly available is a bit tricky.

Another option for block level replication is to use DRBD. But with conventional filesystems, its not a good idea to have the filesystem mounted by more than one host. It can be used incombination with, e.g. GFS2. But this is still rather complex and esoteric. Combined with HTTP caching on the reverse proxy you could have the cache as the preferred location, the "primary" web server next, and the local storage as a third fallback option meaning that you are still handling most of the reads on the local filesystem but only have an issue of replication lag if a node is down.

Then there's filesystems that replicate - GlusterFS is probably the best choice here, and your interpretation of how it works seems accurate - but your concerns are not; this is exactly how glusterFS is expected to be used.

You mention VPS: the hypervisor may already provide a mechansim for sharing a block device across multiple hosts (e.g. io2 on AWS, shared volumes and directory storage on Proxmox) but you would still need to use a parallel filesystem (GFS2) here.

A quick mention here for ZFS replication - which is great but only really works between 2 nodes.

But really your choice depends on 2 specific predicates you have not addressed in your question: How quickly do files change? How are they changed? Maybe all you need is something like lsyncd (there's links to other solutions in th documentation) or perhaps even rsync.

Appreciate the answer. My VPS provider (Vultr) only allows block storage to be mounted to a single host. They do provide S3-compatible object storage but explicitly say they do not support you using it as a block device or mounted filesystem. I updated my answer to address your last two important questions. With these updates, would you say `lsyncd` is still a potential solution? It appears to use `rsync` under the hood which I thought is limited to only 2 nodes and whatever solution I use, I do want it to easily scale as we grow. — SiegeX, Jul 12 '23 at 06:52
If you've got limited options on the infrastructure and if the content is just added to each time, then the master node / http caching / rsync-or-lsync looks like a good solution. I don't understand the issue with rsync - yes, you do need to rsync to each non-master node. Tricky part is routing the file write to the master node and delegating that role if the current master node is offline. — symcbean, Jul 12 '23 at 10:51
OTOH with only 3 nodes, it would be quite possible for each node to mount the relevant directory (via sshfs if NFS not available / can't be secured) from the BOTH the other 2, and use rsync to replicate locally on each node. Gets messy of you want to scale this though. — symcbean, Jul 12 '23 at 10:58
Thank you, with your guidance I was able to setup `glusterfs` and use all 3 nodes as both servers (hosting one brick each) as well as clients to the gluster volume -- it works great and scales arbitrarily large too! — SiegeX, Jul 29 '23 at 04:09

Howto: Block or File replication across 3+ nodes without a SAN

The setup

The Problem

The Question

1 Answers1