4

I have an application which is served from two apache2 servers and I want to configure the ETags on static content. In the future I would also like to use a CDN. I see that this is supposed to be a problem because the Etag information will be different from server to server...

The ETag format for Apache 1.3 and 2.x is inode-size-timestamp. Although a given file may reside in the same directory across multiple servers, and have the same file size, permissions, timestamp, etc., its inode is different from one server to the next.

So if you're using more than one webserver to host your app (like 90% of the webapps you use everyday do), it's supposed to be an issue. However I see Google uses Etags, and certainly they use multiple servers and CDN and edge caching, etc... I get a 304 response for any cached Google content. How do they do it? How do you get around the multiple server issue? Is there a way to configure this with Apache?

perrierism
  • 179
  • 3
  • 9

2 Answers2

5

Current practice is to remove ETags, for precisely the reasons given in OPs post. Instead you can rely on the other caching headers, i.e. Cache-Control and Expires, and cache resources unconditionally (assume static content on a given URL to be unchangeable, so when the content has to change, you give it a new URL too). Steve Souders built the case for this while at Yahoo!, and published a good book about this and other performance improvements.

You can use ETags if you want to; you'll just have to take good care that all servers are configured exactly alike, and that ETags are generated from something that's machine-independant. One way of doing that is to generate ETags from a hash of the file contents, or a hash of (filename + size), as James wrote.

My guess is -- without any evidence -- that Google isn't using a 3rd party CDN, they are just using their own servers in their many datacenters worldwide. They then keep the configuration of their webservers consistent across the globe, and just use something like (last modified time + filesize) as the basis of their ETag.

For the rest of us, not using ETags is IMHO simpler and better.

2

You can configure Apache so it doesn't use the inode as part of the hash. See the FileETag directive.

James
  • 7,643
  • 2
  • 24
  • 33