1

Due to the recent S3 downtime episode on the East Coast, I want to ask the community what is the best way to implement a fault-tolerant S3 website hosting solution?

From my understanding, you need to name a bucket after your domain (e.g., example.com). But this bucket is region-specific and the bucket name is globally unique so I cannot create the same one in another region. So if that region goes down for the S3 bucket, wouldn't that mean my website is down?

TRiG
  • 1,181
  • 3
  • 13
  • 30
Justin
  • 113
  • 4

1 Answers1

6

The short answer is: I can't find a good way of having a S3 hosted static website survive a region failure without using additional logic or servers. I'd be really interested if anyone else can come up with a way.

You could put CloudFront in front of the S3 website. If the bucket goes down the content will be served from the cache even if it's stale. This of course relies on the content being in the cache for the endpoint closest to the customer.

You could use S3 cross region replication to move the S3 data to another region. Originally I thought you could use Route53 failover routing to select from the working bucket, but this won't work. Cross region replication has to be to a bucket with a different name, with S3 website hosting you can only host a website from a bucket named for the domain. I wondered if there was a manual solution, but you can't rename buckets, and if the first region is down you probably can't delete the bucket with the website name.

I thought a combination of CloudFront and Route53 might be made to work, but this would rely on creating two CloudFront distributions that serve content for the same domain. This doesn't seem to be possible.

This isn't ideal, but I think you could do this with EC2 and Route53. Create servers in two or more regions, have them manually proxy S3 content using Nginx or similar, and balance between them with Route53. This defeats the point of S3 hosting and is overall a terrible idea, but it could probably be made to work if it absolutely had to.

Tim
  • 31,888
  • 7
  • 52
  • 78
  • I use HAProxy in front of S3 as often as not (numerous advantages, incl. real time logs)... but the proxy solution only needs to be part of the *backup* strategy -- and that's a perfectly good way to rewrite the `Host:` header to the backup bucket's name after DNS fails over from the primary bucket to the proxy. If CloudFront is healthy and configurable, you could use CloudFront and change the default cache behavior to point to a standby bucket. Or, create a backup Google Cloud Storage bucket with the same DNS-compatible name as the S3 bucket, replicate the objects and fail-over the DNS. – Michael - sqlbot Mar 15 '17 at 00:07
  • @Michael-sqlbot does that solve the region outage problem? You'd probably still need S3 replication and servers in >1 regions. It's also a bottleneck: it puts a server or cluster in front of S3, a reasonably high reliability, massively scalable distributed object store. – Tim Mar 15 '17 at 00:16
  • The bottleneck is as theoretical as anything unless we're talking insane traffic: An EC2 instance in the same region region as a bucket has essentially wire-speed connectivity to S3 and a tight proxy like HAProxy can do hundreds of thousands of reqs/day on a t2.nano (shhh... all my proxies are some size of t2, with healthy credit balances). If the backup bucket and proxy are in region B and Route 53 DNS pointing to the primary Bucket in region A fails over to the proxy/alternate bucket via the proxy, then yes, with S3 replication you have a readable failover solution. – Michael - sqlbot Mar 15 '17 at 01:29
  • ...all the same, don't get me wrong -- you are absolutely correct that there is **no** truly native solution. Any solution does have some user-serviceable parts inside. – Michael - sqlbot Mar 15 '17 at 01:31
  • Does R53 support failing over from S3 to an EC2 instance using failover routing? Gut feel is it probably should. I run LAMP stacks and websites on t2.nano instances, though not high traffic, they're fairly capable for small workloads. – Tim Mar 15 '17 at 01:32
  • Route 53 can change the DNS response based on health checks -- but the thing being checked doesn't *need* to be the thing being pointed to by the DNS records. Obviously, it usually would, but in this case, I'd configure a proxy target -- instead of actually testing S3 with the health check, have an EC2 instance respond to the Route 53 checks and simulate a failure if the checking proxy can't access S3. (For comparison, I have external-facing web servers whose sole purpose is to respond `200 OK` to a Route 53 check if and only if an internal MySQL replica is in sync with the master). – Michael - sqlbot Mar 15 '17 at 01:46