0

I'm building an API and for some responses it will stream the content of S3 objects back to the requester. I would prefer to serve the content directly rather than redirect to send a 302 (e.g. to redirect to a cloudfront distro).

The default is that I read the file into the application and then stream it back out.

If I were using apache or nginx with a local file system I could ask the reverse proxy to stream the content directly from disk with X-Sendfile or X-Accel-Redirect.

Is there an AWS-native mechanism for doing this, so I can avoid loading the file into the application and serving back out again?

Joe
  • 46,419
  • 33
  • 155
  • 245
  • Uh, Cloudfront? – lxg Dec 30 '22 at 13:22
  • Yeah it sounds like you need a CDN. – Mark B Dec 30 '22 at 13:47
  • I've only ever used cloudfront to serve S3 files directly (i.e. from a different path from the API). How would you configure cloudfront to serve the content from an arbitrary API endpoint? And wouldn't serving a whole API through a CDN add latency and introduce caching issues? – Joe Dec 30 '22 at 14:43
  • Fair enough, I think I can come up with some ideas. In the meantime, can you please clarify, when you say “stream the content of S3 objects”, does that involve any sort of intermediary processing? Or is this just the delivery of static objects to a client? – lxg Dec 31 '22 at 00:13
  • Thanks! Literally send the byte stream just as I'd use `x-sendfile`. Use case is that many of the API responses can be pre-generated and the cache is large (total is terabytes) and long-lived (almost never invalidated). I want the responses to come from the API transparently not from a separate hostname or path. – Joe Dec 31 '22 at 07:48
  • I think this question would be a dupe of mine if I served the API through cloudfront (that's not a given). The answer was no in 2016. https://stackoverflow.com/questions/39131427/dynamically-choose-an-s3-object-to-be-served-by-cloudfront/39134845 – Joe Dec 31 '22 at 07:49
  • I’ve posted a few things below, hope this helps. Regarding the 2016 post, I think it misses the Lambda@Edge option. (Not sure when it was introduced, though.) CF functions have just been added last year. – lxg Dec 31 '22 at 09:48

1 Answers1

1

I’m not entirely sure I understand your scenario correctly, but I’m thinking in the following direction:

  • Generally, Cloudfront works like a reverse proxy with a cache attached. (Unlike other vendor’s products where you would “deploy on” the CDN.)

  • You can attach different types of origins to Cloudfront, it has native support for S3 buckets, but basically everything that speaks HTTP can be attached as a custom origin.

  • So, in the most trivial scenario, you would place your S3 bucket behind the Cloudfront, add an Origin Access Policy (OAI) and a bucket policy which permits the OAI to access your content.

  • In order to benefit from caching on the Cloudfront edge, you will need to configure it appropriately, otherwise it will just be a proxy. Make sure to set the Cloudfront TTLs for your content. Check how min/max/default TTL work.

  • But also don’t forget to set headers for your clients to cache (Cache-Control etc); this may save you a lot of money if the same clients need the same content over and over again.

  • As we know, caching and cach invalidation in particular, are tricky. Make sure to understand how Cloudfront handles caching to not run into problems. For example: cache busting with query parameters does work, but you need to make Cloudfront aware that the query sting is significant.

  • Now here comes the exciting part: If you need to react dynamically to the request of the client, you have Lambda@Edge and Cloudfront Functions at your disposal.

    • Lambda@Edge is basically what it says; Lambda functions on the edge. They can work in four modes: Client request, origin request, origin response, client response. Depends what you need to modify; incoming vs. outgoing data and client-Cloudfront vs. Cloudfront-origin communication.

    • CF Functions are pretty limited (ES5 only, no XHR or anything, only works on viewer request/response) but very cheap at the same time. Check the AWS docs to determine what you need.

  • FWIW, Cloudfront also supports signed cookies and signed URLs in case you need to restrict the content to particular viewers.

lxg
  • 12,375
  • 12
  • 51
  • 73
  • Thanks very much for the pointers! I have used cloudfront extensively before for static content, but not with Lambda. This use case isn't a simple website-in-a-bucket, as the API server will have to do some work before returning a response and the struture of the bucket may be different to the public API structure. – Joe Dec 31 '22 at 12:46
  • I could perhaps simulate the behaviour of `X-Sendfile` in Lambda@Edge. I would need intercept the request/response. But I would still need to write the code to retrieve the data from the bucket. I'm not sure this buys me anything more than letting the application server retrieve the data and send it directly. – Joe Dec 31 '22 at 12:48
  • Yeah, I might still be misinterpreting your requirements, sorry. You are right, you might have to implement logic *somewhere* in the end; the Lambda@Edge might buy you some decoupling, though. Also, it might have a cost effect, one way or the other (i.e. be cheaper or more expensive depending on your setup). Also, did you look into presigned URLs/cookies? If you’re looking to deliver confidential/restricted static content, they are usually a candidate. – lxg Dec 31 '22 at 13:28
  • Thanks again. FWIW all the data here is open so signed URLs don't apply. This definitely gives an alternative option to weigh up agianst the default though. It might come out cheaper. – Joe Dec 31 '22 at 13:45