1

When building our web pages from different content sources, it may be necessary to get some images from external servers (e.g. when incorporating a rss feed), which may not be as fast or as well connected as our own data center. I would like a have a means to copy respectively proxy the files into a server address that is running at our site to keep the load off the external servers possible changing the filename to hide that fact that the images were dynamically generated.

e.g. turn the following url

http://domain.de/content/query?file=foo/nr_1.gif

into something like this:

mydomain.net/static/domain.de/query_3fresource_3dfoo_2fnr_5f1.gif

This should honor etags, if-modified-since change expires headers to make the files static and cacheable, regardless of what the originating server says.

I think I could build something like this using varnish and another web server, but maybe there is a solution already available.

This could be part of a CDN, however I do not anticipate the necessity of a real CDN since we do not have many visitors from other countries

Alex Lehmann
  • 217
  • 1
  • 7

1 Answers1

6

I would strongly recommend the use of a proxy such as varnish or squid rather than downloading the files and keeping them yourself, so that the proxy takes care of all of the cache expiry and other entertainments that make caching so much fun.

If you're trying to cache content that is dynamically generated and doesn't have proper expiry information in the request headers, you either need to get whoever's generating those pages to include appropriate headers (based on the rate of change of the data that makes up the page), or if that really isn't possible, then overriding the expiry times in your Varnish VCL file.

"Caching" by retrieving everything to files and then serving them locally means you're probably going to be requesting a whole pile of content that is never actually going to be served to users (meaning that the load -- network traffic, CPU, disk, whatever -- on the origin server is potentially higher than it is now) and you're going to end up reimplementing a large part of what makes a caching proxy useful anyway (expiry, storage management, etc). It's just not worth it. There isn't anything like this out there (that I know of, anyway) because anyone smart enough to make something like this that isn't complete arse is smart enough to realise what a bad idea it is, and how they're just reimplementing the hard half of Squid anyway.

womble
  • 96,255
  • 29
  • 175
  • 230
  • 2
    +1 agreed - use a proxy and write some rules for the proxy. – sybreon Sep 08 '09 at 00:47
  • My main point about using a regular proxy is that objects that are generated by more costly operations should get mapped to static files at our choice and that requires some rewriting of the urls Varnish would be ok, but how do you implement the mapping of the files? – Alex Lehmann Sep 08 '09 at 09:52
  • And my main point is that that is a stupid idea. Answer updated. – womble Sep 09 '09 at 02:12