6

Anyone have a sound strategy for implementing NFS on AWS in such a way that it's not a SPoF (single point of failure), or at the very least, be able to recover quickly if an instance crashes?

I've read this SO post, relating to the ability to share files with multiple EC2 instances, but it doesn't answer the question of how to ensure HA with NFS on AWS, just that NFS can be used.

A lot of online assets are saying that AWS EFS is available, but it is still in preview mode and only available in the Oregon region, our primary VPC is located in N. Cali., so can't use this option.

Other online assets are saying that GlusterFS is a way to go, but after some research I just don't feel comfortable implementing this solution due to race conditions and performance concerns.

Another options is SoftNAS but I want to avoid bringing in an unknown AMI into a tightly controlled, homogeneous environment.

Which leaves NFS. NFS is what we use in our dev environment and works fine, but it's dev, so if it crashes we go get a couple beers while systems fixes the problem, but on production, this is obviously a no go.

The best solution I can come up with at this point is to create an EBS and two EC2 instances. Both instances will be updated as normal (via puppet) to maintain stack alignment (kernel, nfs libs etc), but only one instance will mount the EBS. We set up a monitor on the active NFS instance, and if it goes down, we are notified and we manually detach and attach to the backup EC2 instance. I'm thinking we also create a network interface that can also be de/re-attached so we only need to maintain a single IP in DNS.

Although I suppose we could do this automatically with keepalived, and a IAM policy that will allow the automatic detachment/re-attachment.

--UPDATE--

It looks like EBS volumes are tied to specific availability zones, so re-attaching to an instance in another AZ is impossible. The only other option I can think of is:

  1. Create EC2 in each AZ, in public subnet (each have EIP)
  2. Create route 53 healthcheck for TCP:2049
  3. Create route 53 failover policies for nfs-1 (AZ1) and nfs-2 (AZ2)

The only question here is, what's the best way to keep the two NFS servers in-sync? Just cron an rsync script between them?

Or is there a best practice that I am completely missing?

Community
  • 1
  • 1
Mike Purcell
  • 19,847
  • 10
  • 52
  • 89
  • Depends on your requirements: Do you need to be able to fail-over into another availability zone? What's your RPO and RTO? Do you want the clients to failover automatically? – Andreas Oct 29 '15 at 15:32
  • @Andreas I updated OP. – Mike Purcell Oct 29 '15 at 15:41
  • This is a good question - sad to see there's been little traction. I took matters into my own hands and made use of `VxLAN`, `FRRouting`, `DRBD`, `NFS`, `Corosync`, and `Pacemaker` to get it to work. It is not for the faint of heart - but will save you $$. I now have a fully functional `HA DRBD+NFS` cluster across 2 availability zone. I'll try and document my journey when time avails. – AnthonyK Apr 18 '22 at 11:18

2 Answers2

3

There are a few options to build a highly available NFS server. Though I prefer using EFS or GlusterFS because all these solutions have their downsides.

a) DRBD It is possible to synchronize volumes with the help of DRBD. This allows you to mirror your data. Use two EC2 instances in different availability zones for high availability. Downside: configuration and operation is complex.

b) EBS Snapshots If a RPO of more than 30 minutes is reasonable you can use periodic EBS snapshots to be able to recover from an outage in another availability zone. This can be achieved with an Auto Scaling Group running a single EC2 instance, a user-data script and a cronjob for periodic EBS snapshots. Downside: RPO > 30 min.

c) S3 Synchronisation It is possible to synchronize the state of an EC2 instance acting as NFS server to S3. The standby server uses S3 to stay up to date. Downside: S3 sync of lots of small files will take too long.

I recommend watching this talk from AWS re:Invent: https://youtu.be/xbuiIwEOCAs

Andreas
  • 844
  • 4
  • 10
  • Thanks for the reply. I saw some posts about implementing DRBD and we don't have the time to incorporate that solution right now. We have to switchover by end of this week, so at this point we only have a single NFS server running, next week we will implement some backup solution and will post what we came up with. – Mike Purcell Nov 02 '15 at 14:51
0

AWS has reviewed and approved a number of SoftNAS AMIs, which are available on AWS Marketplace. The jointly published SoftNAS Architecture on AWS White Paper provides more details:

  • Security (pages 4-11)
  • HA across AZs (pages 13-14)

You can also try a 30 day free trial to see if it meets your needs. http://softnas.com/tryaws

Full disclosure: I work for SoftNAS.

Bryon
  • 23
  • 3
  • Ahhh nice. The only prob with SoftNAS is it's another cost to incur, and it introduces a rogue node into a tightly controlled homogenous environment. – Mike Purcell Nov 02 '15 at 21:21