2

I'm wanting to share data among multiple AWS instances in a high-performance, low-latency manner. Giving all instances read-only access (except one instance that would handle writes) is fine. Two points about this use-case:

  1. Nodes attached to the volume might come and go at any time (start, stop, be terminated, etc).
  2. Shared data includes 1000s of potentially small files that need to be listed and have metadata checked.

So I initially tried EFS, but it is rather slow for operations that need to enumerate or modify 100s or 1000s of small files.

So now I'm considering EBS multi-attach. However, to prevent data corruption AWS recommends a using only clustered filesystem like GFS2 or OCFS. Both of those appear to be complex and finicky to configure, as well as fragile for the use-case of a cluster where nodes might come and go at any time. For example, GFS2 requires cluster software on all nodes to be restarted if the number of nodes goes from more than 2 to exactly 2; or, adding a new node involves logging in to a current node, running some commands, and possibly re-distributing an updated config file to all other nodes. It just seems really inflexible as well as a lot of extra overhead.

But if I was sure only 1 instance would be doing the writing to the disk (or possibly each instance could only write to its own subfolder or even disk partition), could I use a regular filesystem like XFS for this volume and get away with it? Or would there be subtle data corruption issues even if access is technically read-only or write-access is restricted to instance-specific subfolders or partitions?

Or is there a completely different solution I'm missing?

J. Miller
  • 123
  • 3
  • 1
    what solution did you go with in the end up? looking at something very similar for myself – John M Dec 22 '22 at 15:18
  • 1
    Short answer: you absolutely need a clustered filesystem for multi-attach, but I don't recommend doing it at all due to the pain points I described in the original post. For infrequently changing files you can distribute them on custom AMIs or EBS snapshots, or download them from S3 on boot, or use rsync or something. For everything else just use EFS. – J. Miller Dec 24 '22 at 06:04

2 Answers2

0

I have tested this(XFS) and it doesn't work. You need a clustered file system. Best bet is to use a clustered File system. Do look at other options such as Veritas Infoscale.

Vijay
  • 16
0

Sharing static volume content appears to work fine with multi-attach and regular XFS. Hot "adds" to the volume are only visible to the instance that wrote the data. With that established, I did not test hot "updates" or "deletes", assuming they would also be seen only by the author but may potentially break access to that data for other instances. Rebooted, restarted, and/or reconnected instances do see the latest volume state. So, the use case with one instance writing new data infrequently that triggers, e.g., forced reboots for the others to eventually see that data appears to be a use case that this technology may support.

SVUser
  • 171
  • 1
  • 4