1

In the paper "Rules of Thumb in Data Engineering" from the year 2000 they suggest the rule:

"Cache web pages if there is any chance they will be re-referenced within their lifetime."

Does this still apply today? Bandwidth is cheap and fast and websites are often dynamic so that there are less cache-hits (I guess so, is this right?). Does it still make sense to cache websites? And if so, what data gets cached? I can imagine pictures or articles, but what about my personal twitter site? And even for articles, there could be breaking news I want to be informed about, so the lifetime of data in the cache should be low.

Is this rule of thumb still relevant and how is it used in practice?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Puckl
  • 731
  • 5
  • 19

4 Answers4

2

Caching is also today very important. So your site reacts faster and your server can handle more clients.

Basically you should let the clients cache everything what is static like css, js and images. If you need that a file is reloaded you can add a new query string.

IMHO is the greatest trick to use a expire date. The clients won't request the file until the the file expired. The etag is also very helpful for detecting changes on dynamic content.

You should also keep in mind that mobile networks are very slow compaired with DSL and co.

rekire
  • 47,260
  • 30
  • 167
  • 264
2

As it have already been said by other members of our community caching is very important tool, of course if You want to reduce server load and latency)) I just want to share a couple of techniques that I had to learn during the last 2-3 weeks:

First thing: You can implement returning Last-Modified http header in response to the request with If-Modified-Since (as well as ETag/If-None-Match, the only difference is that Etag represents some hashed value when Last-Modified represents a date). You just need to compare If-Modified-Since and Last-Modified dates and if the page, image, etc is stale You return it with 200 OK Status Code. If the image hasn't been modified You only return 304 Not Modified Status (the image is served from your browser cache).
In this case You should cache images on a client (browser).

You also can cache them on a server if You want to reduce for example the number of database queries. Or You can use both: ServerAndClient.

Second thing: last 2 weeks I was struggling with this approach)
I set Expires response header to the eternity (e.g. 1 year) and cache images on a client. if the image has changed I build new Url (which contains hashed image' last write date).

I created a windows service that monitores images folder and if the image have been changed it writes the last modified date into the database. Than I add this image' last modified date to the Url.

The benefit of this approach is that the server is hit only when the image has been changed.

E.g. in the first described approach the browser has to sent request to the server to verify whether the image is fresh or not before getting image from it's own cache.

Third thing: don't use Expires http header on it's own (without implementing Last-Modified or Etag) cause if You cache, for instance, an image on a client the browser will take this image from it's cache unless the time expire. So, if the image has been modified users will not see it unless the time expires.

Hope my little experience in caching will help You :)

P.S. cache everything that is possible!

Aleksei Chepovoi
  • 3,915
  • 8
  • 39
  • 77
1

Caching is relevant for static data like images, CSS stylesheets, Javascript code files etc. because it allows the page to load faster on repeated hits and on subpages. And also for static HTML pages, but as you noted, many pages are now dynamic and they announce via HTTP headers that they shouldn't be cached.

Beli
  • 594
  • 5
  • 16
0

yeah it is important...especially because provides a way of retrieving information from websites that have recently gone down

Nick
  • 47
  • 7