5

I have several millions of image files on a S3 storage bucket and I know they will never change. To optimize requests, I decided to add an expires header to my files (as explained here : google page speed rules)

The process of adding the headers to all my files is long and expensive so I'd prefer not to repeat it. However, the Http Rfc recommends to set Expires header with a max expiration date of one year in the future :

HTTP/1.1 servers SHOULD NOT send Expires dates more than one year in the future

...which means I would have to update my headers in one year.

My question is :

Can I set my headers values to a very far date (e.g: 01-01-2020) and go against the RFC recommendation ? What is the risk of doing so ?

Is there another solution to tell the clients that request my files to cache them for an infinite duration, without having to update anything on my amazon S3 storage ?

Jens Bannmann
  • 4,845
  • 5
  • 49
  • 76
Benjamin Simon
  • 444
  • 4
  • 14
  • Well, it's a SHOULD. If a server disobeys any SHOULD, it is at best _conditionally compliant_ as opposed to _unconditionally compliant_ w.r.t. HTTP 1.1 (RFC 2616, section 1.2). So technically it's OK. I don't know how S3 works but can't you just send a date of $(NOW + 1 year) instead of a fixed date? – musiKk May 05 '11 at 06:59
  • It's not possible to dynamically set headers when the client make the requests on S3 if that's what you mean. All headers have "static" values and updating them requires an amazon PUT request for each (actually a CopyObjectRequest) which is a pain when you already have millions of files stored in your bucket. – Benjamin Simon May 05 '11 at 07:35

2 Answers2

11

You could also set the more modern header:

Cache-Control: max-age=31536000, public

Each user agent will, upon loading each image, be willing to keep it cached for an entire year before asking for a new copy. (The large integer there is 365 × 24 × 60 × 60 seconds.) If there are still browsers out there that do not understand Cache-Control, they might gradually disappear over the lifespan of your images!

Jonathan Oliver
  • 5,207
  • 32
  • 31
Brandon Rhodes
  • 83,755
  • 16
  • 106
  • 147
1

You can set your header values to a maximum of around 19 january 2038 (max 32bit timestamp). This is what Google did, for some time, on their tracking cookies expiry time.

The only risk of doing so is that if one day, for some reason, you decide to change an image (or notice there is a problem with one or more), your clients will not download the new version. You decide if it's worth taking the risk.

Other than that, I don't really see any potential problem.

user703016
  • 37,307
  • 8
  • 87
  • 112
  • 1
    The risk of a file being changed is eliminated if you name the file as the MD5 hash of its contents. Then if the file ever changes, the filename will be different, and browser caching will not apply since that filename has never been cached. – Lyle Jan 20 '12 at 02:47
  • Yes actually that's what I did (using md5 hashing for naming) – Benjamin Simon Jun 06 '12 at 08:32
  • I'm curious as to the MD5 approach and its benefits. Seems like its big disadvantage is identifying which image is which (unless you manage all that remotely). What advantages does it hold over, for example, just naming your files 'myfile.[timestamp].png'? – Bobby Jack Sep 28 '12 at 18:07
  • Another alternative is to append a GET parameter that is ignored by S3. Any time you want to make sure people are seeing the latest version, just bump the version number. The browser will treat it as a different file. E.g. myimage.png?v=2, myimage.png?v=3, etc – Greg Apr 18 '13 at 21:21