4

I'm having a problem and tried to follow answers here in forum, but with no success whatsoever.

In order to generate thumbnails, I have set up the following schema: S3 Account for original images Ubuntu Server using NGINX and Thumbor Cloudfront

The user uploads original images to S3, which will be pulled through Ubuntu Server with Cloudfront in front of the request:

http://cloudfront.account/thumbor-server/http://s3.aws...

The big deal is, that we often loose objects in Cloudfront, I want them to stay 360 days in cache. I get following response through Cloudfront URL:

Cache-Control:max-age=31536000
Connection:keep-alive
Content-Length:4362
Content-Type:image/jpeg
Date:Sun, 26 Oct 2014 09:18:31 GMT
ETag:"cc095261a9340535996fad26a9a882e9fdfc6b47"
Expires:Mon, 26 Oct 2015 09:18:31 GMT
Server:nginx/1.4.6 (Ubuntu)
Via:1.1 5e0a3a528dab62c5edfcdd8b8e4af060.cloudfront.net (CloudFront)
X-Amz-Cf-Id:B43x2w80SzQqvH-pDmLAmCZl2CY1AjBtHLjN4kG0_XmEIPk4AdiIOw==
X-Cache:Miss from cloudfront

After a new refresh, I get:

Age:50
Cache-Control:max-age=31536000
Connection:keep-alive
Date:Sun, 26 Oct 2014 09:19:21 GMT
ETag:"cc095261a9340535996fad26a9a882e9fdfc6b47"
Expires:Mon, 26 Oct 2015 09:18:31 GMT
Server:nginx/1.4.6 (Ubuntu)
Via:1.1 5e0a3a528dab62c5edfcdd8b8e4af060.cloudfront.net (CloudFront)
X-Amz-Cf-Id:slWyJ95Cw2F5LQr7hQFhgonG6oEsu4jdIo1KBkTjM5fitj-4kCtL3w==
X-Cache:Hit from cloudfront

My Nginx responses as following:

Cache-Control:max-age=31536000
Content-Length:4362
Content-Type:image/jpeg
Date:Sun, 26 Oct 2014 09:18:11 GMT
Etag:"cc095261a9340535996fad26a9a882e9fdfc6b47"
Expires:Mon, 26 Oct 2015 09:18:11 GMT
Server:nginx/1.4.6 (Ubuntu)

Why does Cloudfront not store my objects as indicated? Max-Age is set? Many thanks in advance.

sullivan
  • 360
  • 1
  • 4
  • 14
  • Its possible you may not be hitting the same Cloudfront location. Each location will cache files individually, until all locations have the file you want cached, it may still retrieve it from the source. – datasage Oct 29 '14 at 15:03
  • I've tried it several times and even created a small java-app - and it seems to me that the cache flushes. I have setup the max-age after some time, but I think it's going to be overridden on existing elemenets? – sullivan Oct 29 '14 at 16:04

1 Answers1

9

Your second request shows that the object was indeed cached. I assume you see that, but the question doesn't make it clear.

The Cache-Control: max-age only specifies the maximum age of your objects in the Cloudfront Cache at any particular edge location. There is no minimum time interval for which your objects are guaranteed to persist... after all, Cloudfront is a cache, which is volatile by definition.

If an object in an edge location isn't frequently requested, CloudFront might evict the object—remove the object before its expiration date—to make room for objects that are more popular.

http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html

Additionally, there is no concept of Cloudfront as a whole having a copy of your object. Each edge location's cache appears to operate independently of the others, so it's not uncommon to see multiple requests for relatively popular objects coming from different Cloudfront edge locations.

If you are trying to mediate the load on your back-end server, it might make sense to place some kind of cache that you control, in front of it, like varnish, squid, another nginx or a custom solution, which is how I'm accomplishing this in my systems.

Alternately, you could store every result in S3 after processing, and then configure your existing server to check S3, first, before attempting the work of resizing the object again.


Then why is there a documented "minimum" TTL?

On the same page quoted above, you'll also find this:

For web distributions, if you add Cache-Control or Expires headers to your objects, you can also specify the minimum amount of time that CloudFront keeps an object in the cache before forwarding another request to the origin.

I can see why this, and the tip phrase cited on the comment, below...

The minimum amount of time (in seconds) that an object is in a CloudFront cache before CloudFront forwards another request to your origin to determine whether an updated version is available. 

...would seem to contradict my answer. There is no contradiction, however.

The minimum ttl, in simple terms, establishes a lower boundary for the internal interpretation of Cache-Control: max-age, overriding -- within Cloudfront -- any smaller value sent by the origin server. Server says cache it for 1 day, max, but configured minimum ttl is 2 days? Cloudfront forgets about what it saw in the max-age header and may not check the origin again on subsequent requests for the next 2 days, rather than checking again after 1 day.

The nature of a cache dictates the correct interpretation of all of the apparent ambiguity:

Your configuration limits how long Cloudfront MAY serve up cached copies of an object, and the point after which it SHOULD NOT continue to return the object from its cache. They do not mandate how long Cloudfront MUST maintain the cached copy, because Cloudfront MAY evict an object at any time.

If you set the Cache-Control: header correctly, Cloudfront will consider the larger of max-age or your Minimum TTL as the longest amount of time you want them to serve up the cached copy without consulting the origin server again.

As your site traffic increases, this should become less of an issue, since your objects will be more "popular," but fundamentally there is no way to mandate that Cloudfront maintain a copy of an object.

Community
  • 1
  • 1
Michael - sqlbot
  • 169,571
  • 25
  • 353
  • 427
  • Thanks for your good feedback. I've setup Min TTL to 31536000 in the admin console, which does, at least for my understanding the following: The minimum amount of time (in seconds) that an object is in a CloudFront cache before CloudFront forwards another request to your origin to determine whether an updated version is available. The default time is 24 hours. To change the time that an object is in the cache, configure your origin to add a Cache-Control max-age directive. See the Help. – sullivan Oct 30 '14 at 10:45
  • I can see why that seems to be saying something different than what it actually means. Updated answer. – Michael - sqlbot Oct 30 '14 at 12:05
  • So your advice would be to use NginxCache or Varnish instead of CDN. As far as I understood your updated post, it's not possible to force CloudFront to keep it for x seconds. – sullivan Oct 30 '14 at 13:01
  • No, not "instead of." "In addition to." Cloudfront is very valuable and fast, but if your motivation is for your resizing server to see absolutely as few requests as possible, you may need a cache between cloudfront and the resizer. I developed a product/service that does this by using S3 as a "cache of infinite size." Cloudfront hits me, I check S3 and return the result if found, otherwise I send the request to the back end, return the response to the requester, then save a copy in S3 for future requests. – Michael - sqlbot Oct 30 '14 at 13:37
  • @Michael-sqlbot where can we find info on this service? Your profile has no link and 'sqlbot' seems to be your preferred internet name and not a company, when Googling. – Jos Nov 03 '14 at 07:58
  • Thanks Michael for your help. I developed a Groovy&Grails App with restful Resizing Service, S3 Backup and CDN in front. But it would be helpful to hear how you achieved yours. – sullivan Nov 03 '14 at 08:41